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1  INTRODUCTION 


During  the  1986  JASON  Summer  Study  a  group  of  JASONs  undertook  to 
examine,  under  the  sponsorship  of  the  Department  of  Energy  and  DARPA, 
the  utility  of  cellular  automata  in  physical  science  calculations,  especially  in 
fluid  dynamics.  We  expanded  the  scope  of  our  study  to  include  several  related 
topics  which  we  concluded  would  be  of  some  interest  to  our  sponsors.  These 
include:  (1)  a  comparison  of  cellular  automata  (CA)  techniques  to  “conven¬ 
tional”  methods  of  solving  the  partial  differential  equations  of  fluid  dynamics 
or  other  physical  situations;  and  (2)  the  utility  and  status  of  using  parallel 
or  concurrent  processing  machines  for  doing  either  CA  or  conventional  fluids 
calculations. 

To  assist  our  study  we  hosted  a  series  of  external  briefers  who  kindly 
gave  of  their  time  and  expertise.  On  the  subject  of  CA  and  applications  to 
fluid  dynamics  we  heard  from  Dr.  Jay  Boris  of  the  Naval  Research  Labo¬ 
ratory  who  spoke  on  “Cellular  Model  for  Tasking  and  Correlation,”  Dr.  T. 
Toffoli  of  MIT  who  spoke  about  “Primitives  of  Computation  and  of  Physics 
as  Applied  to  Cellular  Automata,”  Dr. Gary  Doolen  of  the  Los  Alamcs  Na¬ 
tional  Laboratory  who  addressed  us  on  the  topic  of  “Lattice  Gases,”  Dr.  S. 
Wolfram  of  the  University  of  Illinois  speaking  on  “CA  and  Hydrodynamics,” 
Dr.  S.  Omohundro  of  Illinois  speaking  on  “Applications  of  the  Connection 
Machine  Architecture,”  Dr.  B.  Nemnich  who  spoke  on  “The  Connection 
Machine,”  and  Dr.  P.  Collela  from  the  Lawrence  Livermore  National  Lab¬ 
oratory  speaking  on  “Multiple  Scale  Problems  in  Hydrodynamics.”  On  the 
subject  of  parallel  processing  we  heard,  in  addition  to  the  talks  of  Nemnich 
and  Omohundro,  from  Dr.  J.  Barhen  of  Oak  Ridge  National  Laboratory  on 
“Hypercube  Computer:  Architecture  and  Algorithms  for  Advanced  Applica¬ 
tions,”  and  from  Dr.  J.  Fier  of  the  AMETEK  Computer  Research  Division 
on  “Hypercube  Architecture  and  Applications.”  To  all  these  workers  in  the 
field  we  give  our  thanks  for  their  assistance. 

The  work  reported  on  in  the  present  study  was  begun  at  JASON  in  the 
summer  of  1986.  Since  that  time  there  has  been  a  considerable  maturation  of 
the  field  of  cellular  automata.  The  reader  desiring  further  background  may 
refer  to  two  excellent  review  volumes: 
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1.  Complex  Systems  volume  1,  no.  4  (August  1987);  this  volume  is  based 
largely  on  presentations  of  a  workshop  held  in  Santa  Fe,  NM  in  October 
of  1986. 

2.  Lattice  Gas  Methods  for  Partial  Differential  Equations,  edited  by  G. 
Dollan  et  al.  (Addison- Wesley,  N.Y.,  1989). 

Our  own  work  has  been  organized  along  the  following  lines: 


•  We  have  examined  the  relationship,  in  terms  of  effort  and  efficiency,  of 
doing  fluid  dynamics  calculations  using  cellular  automata  versus  more 
conventional  spectral  or  finite  difference  methods.  We  conclude  that 
cellular  automata  calculations  are  likely  to  be  competitive  with  stan¬ 
dard  finite  element  or  spectral  methods  for  the  Navier-Stokes  equations 
primarily  for  low  Reynolds  numbers  and  Mach  numbers.  An  exception 
to  this  may  occur  in  complex  geometries  or  with  boundary  conditions 
where  conventional  methods  are  often  quite  difficult. 

•  We  have  reviewed  the  derivation  of  ordinary  Navier-Stokes  Newtonian 
fluid  dynamics  from  kinetic  theory  and  compared  it  to  the  derivation  of 
fluid  dynamics  from  CA.  Only  for  low  Mach  numbers  do  we  have  some 
confidence  that  fluid  dynamics  is  being  simulated  in  these  calculations. 

•  We  have  looked  into  two  schemes  for  carrying  out  CA  calculations  in 
three  dimensions -one  uses  the  notion  of  tiling  or  covering  3-D  space 
with  a  quasi-periodic  lattice  of  Penrose  type,  and  the  other  investigates 
the  idea  of  doing  CA  computations  in  dimensions  larger  than  three  and 
then  projecting  the  results  back  into  three  dimensions.  In  each  case 
the  issue  is  achieving  enough  structure  in  the  underlying  covering  of 
3-D  space  to  assure  correct  tensorial  characteristics  of  the  quantities 
entering  the  Navier  Stokes  equations. 

•  We  have  formulated  in  an  abstract  fashion  the  problem  of  represent¬ 
ing  in  a  physical  3-D  computational  structure  2-D  highly  parallel  CA 
computations. 


Perhaps  it  is  useful  to  define  what  we  mean  by  some  of  the  terms  used 
already  in  this  introduction,  especially  those  which  will  appear  throughout 
the  report.  First  of  all,  we  need  to  address  the  meaning  of  cellular  automata. 
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The  idea  of  cellular  automata  is  that  by  using  very  restricted  information 
on  the  position  and  velocities  of  individual  particles  in  their  microscopic 
collision  and  interaction  one  can,  nonetheless,  arrive  at  a  good  representation 
of  macroscopic  dynamical  equations  such  as  the  Navier-Stokes  equations, 
since  the  latter  involve  averaging  over  larger  numbers  of  individual  particles. 
Furthermore,  the  averaging  is  over  space  and  time  scales  large  compared  to 
the  microscopic  dynamics,  so  one  might  think  that  a  crude  representation  of 
the  latter  could  result  in  realistic  and  acceptable  macroscopic  physics.  The 
generality  of  macroscopic  equations  such  as  the  diffusion  equation  or  the  fluid 
dynamics  equations  which  are  parametrized  by  a  few  transport  coefficients 
in  which  all  the  microphysics  is  buried  would  also  support  this  point  of  view. 
The  key  notion  of  CA,  as  opposed  to  the  ideas  of  molecular  dynamics,  is 
to  very  crudely  represent  each  individual  particle  motion.  Crude  here  means 
giving  the  position  on  a  simple  lattice  covering  the  space  of  interest  and  giving 
the  velocity  as  one  of  a  few  choices  such  as  plus  or  minus  unity.  In  other 
words,  by  representing  the  phase  space  coordinates  of  individual  particles  by 
only  a  few  bits  of  information,  one  hopes  that  the  aggregate  average  needed 
to  construct  the  macroscopic  velocity  or  density  is  accurately  and  efficiently 
computable. 
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2  CELLULAR  AUTOMATA  RULE  FIND¬ 
ING  FOR  USE  IN  3-D  FLUID  DYNAM¬ 
ICS 


There  is  some  difficulty  in  finding  simple  cellular  automata  dynamics 
in  three  dimensions  which  adequately  simulate  the  Navier-Stokes  equation. 
From  a  purely  formal  (i.e.,  nonphysical)  point  of  view,  there  are  a  number 
of  ways  of  overcoming  these  difficulties,  and  as  we  will  show,  many  of  the 
purely  formal  procedures  have  a  straightforward  physical  interpretation.  As 
a  consequence  of  these  investigations,  we  can  present  what  appears  to  be  the 
simplest,  and  most  straightforward,  cellular  automata  simulation  of  3-D  fluid 
motion. 

To  begin,  whether  in  two-  or  three-dimensions,  our  lattice  sites  will  al¬ 
ways  be  the  points  with  all  integer  coordinates,  i.e.,  the  usual  integral  lat¬ 
tice.  In  addition  are  given  a  collection  of  vectors  e1,  e2,  ...  eN  each  vector 
belonging  to  the  lattice.  In  the  list,  a  given  vector  may  occur  repeatedly. 
At  any  instant  of  time,  the  total  state  of  our  automaton  is  described  by 
an  TV-dimensional  vector  of  zeros  and  ones  given  for  each  lattice  site.  In 
other  words,  the  state  of  our  automata  at  time  t  is  described  by  specifying 
Aa(x,  t),  a  —  1, 2, ...  TV,  and  x  running  through  the  lattice  sites.  For  fixed  x 
and  t,  the  vector  [Ai(x,  t),  ^(x,  t), . . . ,  A;v(z,  f)]  is  called  the  state  vector  at 
site  x,  time  t. 

The  evolution  or  dynamics  in  our  automaton  is  given  ac  follows: 

Aa(x,t  +  1)  =  Fa, a  =  1,2,. . .  TV, 

where  each  Fa  is  a  fixed  function  of  the  state  vectors  of  the  automaton  at  time 
t  at  sites  in  a  fixed  neighborhood  of  x.  What  we  mean  to  say  here  is  simply 
that  the  rule  for  advancing  in  time  is  the  same  function  of  neighboring  states 
at  each  site  and  all  times.  You  may  think  of  the  Fa  as  including  both  the 
collision  laws  and  the  motion  of  the  more  conventional  description  of  cellular 
automata,  but  in  general  they  have  no  such  simple  physical  description. 

In  order  to  bring  out  clearly  what  the  essential  properties  of  the  dynamics 
are,  we  will  write: 

Fa  =  Aa(x  -  e°,<)/no  (2-1) 
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and  demand  that  identically  for  all  total  states  of  the  automata  at  time  t, 
we  have: 


£na  =  o 

a 

(2-2) 

£eafia  =  0. 

(2-3) 

The  fla’s  are  simply  the  measure  of  the  difference  in  the  dynamics  of  the 
automaton  from  straight  collisionless  motion  of  the  particles.  They  are  rarely 
mentioned  specifically  for  most  of  the  familiar  constructs,  but  they  do  play 
an  important  formal  role,  as  we  shall  see  momentarily.  It  is  quite  easy  to 
see  that  conditions  in  Equation  (2-2)  are  met,  as  a  matter  of  fact,  for  the 
familiar  square  and  hexagonal  lattice  fluid  dynamics  constructs,  without  even 
specifically  bringing  in  the  Cl's. 

For  our  original  fist  of  vectors  e“,  we  are  going  to  insist  that  all  the  tensors 
Ha  e“i  Ha  c“  <8>  ea,  Ha  e  ®  e  <g>  e“  and  Yla  ea<g>eQ®ea®e“  be  isotropic. 

Thus,  for  example,  if  we  take  in  two  dimensions  the  20  vectors:  (±1,  ±1) 
once  each,  (±1,0)  4  times  each,  and  (0,±1)  4  times  each,  we  satisfy  the 
isotropy  requirements.  In  three  dimensions,  we  may  take  the  24  vectors 
(±1,  ±1,0)  once  each,  (±1,0,0)  twice  each,  (0,  ±1,0)  twice  each,  and  (0,0,  ±1) 
twice  each  to  satisfy  the  isotropy  requirements. 

In  general,  for  a  given  set  of  e’s,  it  seems  quite  easy  to  produce  a  large 
number  of  Fa's  satisfying  the  requirements  (2-1)  and  (2-2)  above.  Subse¬ 
quently,  we  will  produce  Fa's  for  the  e’s  described  in  the  paragraph  immedi¬ 
ately  above. 

Suppose  we  assume  that  our  automata  locally  equilibrate  in  space.  De¬ 
note  by  E  the  admittedly  vague  notion  of  expectation  operator  for  the  local 
spatial  equilibration,  and  put  fa{x,t)  =  £(A0(x,t)).  By  virtue  of  the  re¬ 
quirements  (2-1)  and  (2-2),  we  obtain: 

EMM  +  l)  =  ElJ.-e*,!)  (2-4) 

a  a 

2>“f.(x,t  +  l)  =  £e°f.(x-eV).  (2-5) 
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Denote  n(x,<)  =  £0fa(x,t)  and  n(x,t)  u(x,t)  =  £eafa(x,t).  In  the 
continuum  limit  of  long  times  and  large  lattices,  the  equations  above  become: 


dn 

—  +  V  •  nu  =  0 
at 


(2-6) 


or(n«)  +  £  «■(«■•  Vf.)  =  0. 


It  is  convenient  to  refer  to  n(x,  t)  as  particle  density,  u(x,  t)  as  average  ve¬ 
locity,  and  nu  as  momentum  density.  With  this  convention  the  requirements 
(2-1)  and  (2-2)  provide  for  conservation  of  particle  number  and  momentum. 
For  if  we  pick  an  initial  state  for  the  automaton  in  which  only  finitely  many 
of  the  state  vectors  tire  non-zero,  we  get  from  (2-1)  that 

52*(x,t  +  1)  =  £»(*,<) 


or  in  the  continuum  limit 


^  J  n(x,t)dx  =  0. 


Similarly  (2-2)  gives  conservation  of  momentum  in  time. 

Analogously,  if  the  initial  state  for  the  automaton  is  periodic  in  the  spa¬ 
tial  variables,  so  also  are  subsequent  states  and  with  the  same  periods,  and 
the  total  particle  number  and  total  momentum  over  a  periodic  rectangular 
parallelogram  or  parallelepiped  is  conserved  in  time.  Imposing  periodocity 
is  equivalent,  of  course,  to  running  the  dynamics  on  a  2-  or  3-D  torus. 

At  this  point,  we  make  the  bold  assumption  that  the  functions  fa(x,  t) 
can  be  determined  from  the  equilibrium  parameters  u(x,t)  and  n(x,t)  by 
a  Chapman  Enskog  expansion.  Arguing  just  as  in  Wolfram1  we  end  with 
Navier-Stokes  like  equations  for  a  continuum  fluid. 

For  the  rest  of  this  section,  we  are  going  to  show  that,  at  least  in  some 
cases,  the  formal  procedures  described  above  really  work.  Along  the  way,  we 
will  produce  a  particularly  simple  way  of  running  3-D  fluid  dynamics  at  the 
cellular  automata  level. 

We  will  begin  with  a  4-D  cellular  automaton  with  the  lattice  sites  being 
as  usual  the  points  with  all  integral  coordinates.  Lattice  points  are  labelled 
(x,y,z,w).  We  take  the  sublattice  of  integral  points  for  which  the  coordinate 
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sum  is  even,  which  is  of  index  2  in  the  full  integral  lattice.  This  lattice  is 
connected  with  a  very  interesting  tessalation  of  tne  4-D  space,  but  this  does 
not  concern  us  here.  In  the  sublattice  we  take  all  points  of  least  nonzero 
distance  from  the  origin,  these  being  precisely  the  24  vectors:  (±1,  ±1,0,0) 
and  its  permutations.  These  24  vectors  are  the  list  ea,  a  =  1, 2, . . . ,  TV  =  24. 
It  is  easy  to  verify  that  the  requirements  of  isotropy  are  met  by  this  list,  that 
is,  for  example: 


£(ea),(ea)_,(ea)*(ea)/  =  StJ6ke  +  6ik6jt  +  SitSjk. 


There  are  a  number  of  available  scattering  laws  to  give  a  good  momentum 
scramble,  e.g.: 

A)  Binary:  (1,  -1,  0,  0)  +  (-1,  1,  0,  0)  ~  (0,  0,  1,  -1)  +  (0,  0,  -1,  1) 

B)  Ternary:  (1,  -1,  0,  0)  +  (0,  1,  -1,  0)  +  (-1,  0,  1,  0)  ~  (-1,  1,  0,  0)  + 

(0,  -1,1,  0)  +  (1,0, -1,0) 

C)  Ternary:  (1,  0,  0,  -1)  +  (1,  0,  0,  1)  +  (-1.  1,  0,  0)  ~  (0,  1,  0,  1)  +  (0, 
1,  0,  -1)  +  (1,  -1,  0,  0)  and 

D)  Ternary:  (1,  0,  0,  -1)  +  (1,  0,  0,  1)  +  (-1,  1,  0,  0)  ~  (1,  1,  0,  0)  +  (0, 

1,1,0) +  (0,-1, -1,0) 

and  so  avoid  undesirable  conservation  laws.  If  we  run  a  4-D  cellular  automa¬ 
ton  with  a  suitable  collection  of  such  scattering  laws  on  the  integral  lattice, 
there  is  however  one  inevitable  conservation  law,  which  will  not  concern  us  in 
the  applications  we  make.  This  arises  simply  from  the  fact  that  the  dynam¬ 
ics  takes  place  in  two  uncoupled  sets,  namely  the  sublattice  of  points  whose 
coordinate  sum  is  even  and  the  coset  of  points  whose  coordinate  sum  is  odd. 

The  application  of  this  4-D  cellular  automaton  to  3-D  problems  is  now 
quite  straightforward.  Take  any  initial  state  in  the  3-D  section  (x,y,z,w  =  0), 
and  extend  it  to  4-D  by  repeating  it  in  every  section  w  =  k,  k  an  integer.  If  we 
have  deterministic  scattering  laws,  then  throughout  the  evolution  all  sections 
w  =  k  remain  the  same  as  the  section  w  =  0.  [Even  if  we  have  probabilistic 
scattering  laws,  we  can  maintain  the  identity  of  sections  by  insisting  that  the 
choice  made  at  a  site  (x,y,z,0)  in  the  section  w  =  0  be  repeated  at  all  sites 
(x,y,z,w)]. 
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All  sections  remain  the  same,  thus  macroscopic  particle  density,  momen¬ 
tum  density  and  average  velocity  vector  are  independent  of  w.  since  we  get 
the  same  average  over  a  region  and  any  translation  of  the  region  along  the 
w-axis. 

The  even  sublattice  and  its  colattice  are  now  coupled  together  because 
the  sites  (x,y,z,  even)  and  (x,y,z,  odd)  have  the  same  state. 

If  we  now  look  at  the  4-D  Navier  Stokes  equation  which  our  4-D  automa¬ 
ton  is  presumably  simulating,  then  since  density  and  momentum  density  are 
not  functions  of  w,  the  projection  7 ru  of  the  velocity  vector  u  into  the  section 
w  =  0  together  with  the  original  density  n,  satisfy  a  Navier  Stokes  like  system 
in  3-D. 

Since  the  sections  w  =  k  remain  the  same  throughout  the  evolution  of 
the  4-D  automaton,  the  dynamics  can  be  fully  described  in  the  section  w  = 
0.  How  do  we  do  so?  We  need  to  describe  the  state  at  each  3-D  site  with 
a  24  bit  vector,  corresponding  to  presence  or  absence  of  a  particle  headed 
in  direction  e“;a  =  1,2,..., 24.  Since  we  will  be  concerned  only'  with  the 
projection  of  the  4-D  momentum  into  the  section  w  =  0,  we  project  the  24 
vectors  ea  to  get: 


(±1,  ±1,0) 

once  each 

(±1,0, ±1) 

once  each 

(0,  ±1,  ±1) 

once  each 

(±1,0,0) 

twice  each 

(0,±1,0) 

twice  each 

(0,0, ±1) 

twice  each 

The  versions  .1  the  projected  vectors  which  occur  twice  are  to  be  labelled;  we 
call  one  spin  plus,  the  other  spin  minus,  depending  on  the  sign  of  the  invisible 
4th  coordinate.  The  scattering  laws  such  as  (A),  (B),  (C),  (D)  described 
earlier  are  replaced  by  their  versions  with  the  last  '•oordinate  suppressed, 
but  the  vectors  therein  labelled  spin  plus  or  minus  as  required.  The  3-D 
motion  after  scattering  takes  place  along  the  24  projection  of  the  vectors  e°. 

Here  we  have  precisely  what  was  described  only  as  a  possibility  earlier, 
namely  a  3-D  Navier  Stokes  simulation  using  some  vectors  e°  repeatedly.  The 
3-D  isotropy  is  obvious. 
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The  2-D  simulation  using  20  vectors,  described  earlier,  can  be  simulated 
by  projecting  once  more,  into  the  section  z  =  0.  It  is  relatively  easy  to  see 
that  the  four  rest  particles  arising  from  the  projection  of  (0,0,  ±1),  can  be 
dispensed  with  altogether,  but  we  do  not  go  into  details. 

As  matters  currently  stand,  we  can  run  our  3-D  Navier  Stokes  simulation 
with  a  24-bit  vector  to  describe  the  state  at  each  site.  We  want  to  argue  now 
that  24  can  be  reduced  to  18.  To  do  so,  we  return  to  our  4-D  simulation,  with 
all  sections  w  =  k,  k  an  integer,  the  same.  Suppose  for  the  moment  that  the 
state  vector  at  each  site  in  the  section  w  =  0  is  reflectively  symmetric.  By  this 
we  mean  that  if  one  of  the  particle  directions  such  as  (1,  0,  0,  1)  occurs,  so 
also  does  (1,0,0,  —1).  This  being  the  case,  the  state  vector  can  be  described 
by  an  18  bit  vector.  Suppose  also  that  the  scattering  rules  we  choose  to  use 
are  reflectivity  symmetric.  By  this  we  mean  that  for  every  scattering  law, 
the  law  obtained  by  changing  the  signs  of  last  coordinates  is  also  a  scattering 
law  we  use.  (With  some  care,  probabilistic  scattering  laws  can  be  handled). 
All  this  being  so,  it  is  easy  to  see  that  in  the  evolution  of  the  4-D  automata, 
the  section  w  =  0  remains  reflectively  symmetric,  and  so  the  3-D  dynamics 
needs  only  18  bits  per  site.  At  the  same  time,  we  must  assure  ourselves  that 
we  have  enough  applicable  scattering  laws  to  avoid  undesirable  conservation 
laws.  It  is  clear  that  we  will  not  succeed  by  using  only  those  vectors  which 
occur  singly.  But  the  purpose  of  our  writing  down  scattering  laws  (C)  and 
(D)  earlier  was  to  persuade  the  reader  now  that  enough  scattering  survives, 
these  being  two  instances  of  scattering  laws  in  which  the  spin  plus  and  spin 
minus  directions  occur  in  pairs.  Actually,  a  binary  law,  such  as  (1,  0,  0, 
-fl)  -f  (1,  0,  0,  -1)  (g>  (1,  1,  0,  0)  +  (1,  -1,  0,  0),  will  eliminate  unwanted 
conservations. 

It  is  interesting  to  note  that  in  running  a  4-D  reflectively  symmetric  cellu¬ 
lar  automaton  with  all  sections  w  =  k  the  same,  the  macroscopic  momentum 
density  has  no  component  in  the  direction  of  the  w-axis. 

Now  that  we  are  done  describing  our  3-D  simulation,  it  all  seems  quite 
trivial.  We  have  simply  reversed  a  standard  device  in  fluid  dynamics.  If, 
for  example,  one  wishes  to  describe  a  3-D  planar  flow  about  a  cylindrical 
obstacle,  one  reduces  to  a  2-D  problem.  We  have,  perversely,  taken  a  3- 
D  problem  and  imbedded  it  in  a  4-D  hyperplane  flow  setting,  to  gain  the 
advantage  of  the  useful  lattices  existing  in  four  dimensions. 
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There  is  another  4-D  sublattice  of  the  integral  lattice  which  offers  some 
promise.  Essentially  the  dual  of  the  sublattice  considered  earlier,  it  consists 
of  points  all  of  whose  coordinates  are  even,  or  all  of  whose  coordinates  are 
odd.  It  is  of  degree  8  in  the  full  integral  lattice.  The  vectors  to  consider  in 
this  instance,  24  in  number  also,  are 

(±2, 0,0,0)  and  its  permutations  and 

(±1,±1,±1,±1). 

These  vectors  give  the  desired  isotropy,  and  projecting  into  w  =  0  gives 
two  rest  particles,  8  particle  directions  which  occur  twice,  and  6  particle 
directions  occurring  once.  The  rest  particles  can  be  eliminated;  running 
the  remainder  demanding  reflective  symmetry  as  before  will  give  us  a  3- 
D  simulation  requiring  a  14  bit  vector  to  describe  the  state  at  each  site. 
Scattering  laws  such  as 

(1111)  +  (111  -  1)  +  (-2000)  «->  (-1111)  +  (-111  -  1)  +  (2000) 

will  give  us  some  momentum  scramble,  but  we  do  not  appear  to  have  an 
analogue  of  the  very  useful  scattering  law  (D)  used  in  the  earlier  lattice. 
Without  such  analogue,  the  total  number  of  particles  in  all  of  the  directions 
given  by  the  8  paired  spin  plus  and  spin  minus  directions  is  conserved,  and 
this  gives  us  a  conservation  law.  The  use  of  higher  order  scattering  laws  can 
eliminate  conservation. 

The  fact  that  the  3-D  dynamics  takes  place  in  4  uncoupled  colattices 
presents  no  difficulties;  restrict  one’s  attention  to  the  sublattice:  all  coordi¬ 
nates  even,  or  all  coordinates  odd. 

It  is  appropriate  to  describe  at  this  point  what  we  feel  is  the  right  way  to 
search  for  conservation  laws.  Recall  that  there  is  an  assumption  underlying 
the  whole  discussion  of  cellular  automata,  namely  that  particle  number  and 
momentum  are  the  only  conserved  quantities. 

Now  we  suppose  that  the  automaton  dynamics  is  broken  into  two  parts, 
the  applications  of  the  scattering  laws  followed  by  the  lattice  motions.  At  an 
instant  in  time  we  have  a  state  vector  A(x,t)  which  after  scattering  becomes 
A'(x,f),  followed  by  motion  to  neighboring  sites,  so  that  Aa(x,f  +  1)  =  A'a(x  — 
ea,t).  Now  suppose  there  is  a  vector  v  such  that 

J2v°Aa{x,t)  =  £t>„/^(x,t), 
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no  matter  what  x  and  t,  and  such  that  also  '52avaAa(x,t)  is  not  identically 
zero.  Then  we  have  a  particle  number  conservation  law,  since  now 


^2vaAa(x,t  +  1)  =  ^vaA'a{x  -  eQ,f). 

a 

Integrating  over  all  space  and  taking  expectations  yields 

J  v  •  f(x,t)dx  =  J  v  •  f(x,t  +  l)di 

which  is  not  the  usual  conservation  law  if  v  •  f(x,t)  is  not  a  scalar  multiple 
of  n(x,  t). 

A  law  different  from  the  usual  momentum  conservation  law  would  arise 
from  a  vector  v  such  that 

^TvaeaAa(x,t)  =  X>aeM  'a(x,t). 

a  a 

On  taking  the  dot  product  of  the  last  with  a  fixed  arbitrary  vector,  we  would 
find  a  particle  conservation  law  of  the  sort  considered  just  above,  so  it  suffices 
to  search  for  particle  number  conservation  laws.  As  noted  above,  these  arise 
from  vectors  v  orthogonal  to  A(x,t)  —  A'(x,t)  for  all  possible  x  and  t. 

2.1  3-D  Cellular  Automata 


We  will  now  discuss  in  greater  detail  some  of  the  technical  issues  involved 
in  efficient  use  of  3-D  cellular  automata  for  simulating  fluid  flow,  confining 
our  attention  to  the  two  kinds  of  automata  described  in  Section  2.  The  first 
of  these  is  associated  with  the  4-D  lattice  of  integral  points  whose  coordinate 
sum  is  even  and  for  which  we  pick  lattice  motions  corresponding  to  the  24 
directions  (±1,  ±1,0,0)  and  its  permutations,  this  choice  assuring  sufficient 
isotropy  of  the  flow  to  mimic  the  Navier-Stokes  equations.  The  second  choice 
is  the  lattice  of  integral  points  for  which  all  coordinates  are  even  or  all  are  odd. 
In  this  case  we  pick  24  lattice  motions  associated  to  the  vectors  (±2, 0,0,0) 
and  its  permutations  and  (±1,  ±1,  ±1,  ±1),  enough  again  to  insure  isotropy. 

In  both  cases  we  are  going  to  use  the  4-D  automaton  to  simulate  3-D 
problems.  This  is  effected  by  choosing  initial  conditions  in  the  section  w  = 
0,  and  repeating  them  in  all  the  other  sections  w  =  k.  If  the  scattering  law 
used  at  the  site  (a,  b,  c,  k)  is  the  same  for  all  k  at  each  fixed  instant  in  time, 
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then  all  sections  remain  the  same  in  the  evolution  of  the  automaton,  and  it 
is  enough  to  simply  follow  the  evolution  in  the  section  w  =  0. 

In  order  to  further  reduce  the  computational  burden,  the  following  useful 
device,  already  pointed  out  in  Section  2,  if  a  particle  is  needed  in  direction 
(a,  b,  c,  d),  there  is  also  one  headed  in  the  direction  (a,  b,  c,  -d). 

For  the  first  lattice  mentioned  above,  this  convention  permits  the  state 
at  a  3-D  site  to  be  described  by  an  18-bit  vector.  For  the  second,  the  state 
at  a  site  requires  16  bits.  But  since  it  is  easy  to  see  that  the  effect  on  the 
evolution  of  the  automaton  arising  from  the  pair  of  directions  (0,0,0,  ±2),  if 
these  are  not  used  in  any  scattering  laws,  is  irrelevant  and  does  not  create 
any  unwanted  conservation  in  the  3-D  section  w  =  0,  the  number  of  bits 
necessary  to  describe  the  state  at  a  site  can  be  reduced  to  14. 

For  the  second  lattice  the  pair  of  directions  (0,0,0,  ±2),  which  look  like 
rest  particles  from  the  3-D  point  of  view,  have  simply  been  dropped.  For 
either  lattice,  a  pair  of  directions  such  as  (a,  b,  c,  1)  and  (a,  b,  c,  -1),  both 
of  which  occur  at  a  site,  or  neither,  is  called  a  “married  pair.” 

For  both  lattices,  a  variety  of  scattering  laws  are  available,  much  more  so 
than  in  the  simple  2-D  hexagonal  lattice,  even  with  the  restrictions  created 
by  the  married  pairs.  The  problem,  as  we  see  it,  is  to  have  a  rich  enough 
collection  of  scattering  laws  to  eliminate  any  undesirable  particle  or  momen¬ 
tum  conservation  laws,  but  not  so  many  that  the  computational  logic  at  a 
site  becomes  unwieldy  or  too  slow. 

Part  of  the  difficulty  in  achieving  this  goal  is  a  consequence  of  the  obser¬ 
vation  that  some  randomization  is  needed  in  applying  scattering  laws.  For 
example,  if  the  pair  (1,  1,  0,  0)  and  (-1,  -1,  0,  0)  are  present  at  a  site,  they 
can  be  replaced  by  any  other  velocity  vector  pair  which  sums  to  zero.  How 
shall  the  choice  be  made?  Similarly,  we  may  have  scattering  laws: 


(A)  (1,  1,  0,  0)  +  (-1,  -1,  0,  0)  4-  (1,  0,  -1,  0)  +  (-1,  0,  1,  0) 

(B)  (1,  1,  0,  0)  +  (1,  -1,  0,  0)  ~  (1,  0,  1,  0)  +  (1,  0,  -1,  0). 

If  all  three  velocity  vectors  on  the  left  are  present  at  the  site,  and  none  of 
those  on  the  right,  which  of  the  two  laws  should  be  applied? 
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Recall  that  the  ultimate  macroscopic  features  of  the  flow  arc  to  be  ob¬ 
tained  by  averaging  over  suitable  sub-regions  of  the  3-D  grid.  If  the  scattering 
is  very  limited,  or  not  always  applied  when  available,  then  mean-free-paths 
will  be  quite  long;  consequently  the  suitable  sub-regions  of  the  grid  will  have 
to  be  undesirably  large.  On  the  other  hand,  attempting  to  put  in  all  the 
scattering  available  will  require  intricate  combinatorial  decisions,  and  a  fair 
amount  of  randomization. 


For  these  and  other  reasons,  it  is  desirable  to  keep  the  scattering  laws 
as  simple  as  possible.  The  simplest  scattering  laws  are,  of  course,  binary 
scatters,  and  physically  these  are  the  most  likely  to  occur.  It  is  fortunate 
that  for  the  first  of  the  two  lattices  described  earlier,  with  18  bits  per  site  for 
the  state  vector,  the  use  of  binary  scattering  only  suffice. 


We  describe  the  situation  here  in  a  little  more  detail  now.  We  can  inter¬ 
change  any  of  the  four  following  pairs 


(I)  ( 


100  1  -1  00  10 
-1  0  0  ^  (  -1  1  0  0  ’’  (  -1  0 


10  10—10 
-1  O^-l  0  10 


or  any  of  the  following  three  pairs 

.1  1  0  0  ,  .  1  0  1  0  ,  .  1  0  0  1 

1  —1  0  0  ’  1  0  -1  0  ’  1  0  0  —  1  ’ 


and  analogously, 


(III) 


,1100  01  10w0  10  1 
'  -1  1  0  0  {  0  1  -1  0  (  0  10-1^’ 


(IV) 


,10  10 
(  -1  0  1  0 


), 


,  0  110 
(  0  -1  1  0 


),  ( 


0  0  1 
0  0  1 


j  ),  and 


(I)  gives  us  6  binary  scatters,  (II)  to  (VII)  give  us  3  binary  scatters  each, 
for  a  grand  total  of  24  scatters.  Notice  that  these  scattering  laws  meet  the 
restriction  imposed  by  the  married  pairs.  It  is  also  possible  to  verify  that 
these  binary  laws  are  sufficient  to  avoid  undesirable  conservation  laws  (as 
described  in  Section  2);  it  is  the  exchanges  offered  by  (II)  through  (VII) 
which  mix  up  the  “married  pairs”  with  the  “single”  velocity  vectors. 


We  have  not  investigated  the  question  of  how  many  of  the  binary  scatters 
can  be  dropped  while  still  avoiding  conservation  laws,  though  it  is  clear  that 
some  can  be.  We  could,  for  example,  rather  than  permit  the  interchanging 
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of  all  the  pairs  of  (I),  interchange  only  pair  1  with  pair  2,  2  with  3,  . . pair  4 
with  pair  1.  Even  with  limitations  such  as  this,  the  question  of  which  inter¬ 
change  is  to  be  made  has  to  be  decided  with  a  randomization;  if  the  second 
pair  is  present,  and  not  the  first  or  the  third,  which  of  the  two  interchanges 
should  be  effected? 

It  is  problematic  whether  using  a  thin  set  of  the  binary  scattering  laws, 
big  enough  to  avoid  conservation,  will  give  us  results  as  favorable  as  using 
them  all.  We  do  not  see  any  particular  advantage  to  limited  use,  and  are 
going  to  proceed  on  the  basis  of  using  all  the  binary  laws,  and  on  something 
like  an  equal  footing. 

Now  we  describe  one  scheme  for  using  all  of  the  24  laws  of  the  first  lattice. 
The  laws  are  labeled  from  one  to  twenty-four.  A  random  number  generator, 
somewhere  on  the  side,  picks  a  number  between  one  and  twenty-four,  one 
for  each  site,  and  applies  the  selected  scattering  law  if  it  can  be  applied; 
otherwise  no  scattering  takes  place.  The  random  choice  of  scatter  at  each 
site  is  followed  by  motion  of  particles  to  neighboring  sites,  and  then  repetition 
of  the  procedure. 

In  practice  it  may  be  desirable  to  have  the  random  selection  of  the  integer 
between  one  and  twenty-four  other  than  uniform.  Since  there  are  a  very  large 
number  of  fast  methods  for  generating  random  numbers  with  frequencies,  we 
will  not  go  into  these  questions  here.  There  is,  nonetheless,  quite  a  large  bit  of 
randomization — one  generator  for  each  site.  Possibly  the  randomization  can 
be  carried  out  intrinsically  as  follows:  use  the  bits,  and  their  complements, 
in  the  far  field  of  a  site  to  generate  the  random  number  at  the  site.  There 
are  obvious  risks  and  deficiencies  in  this  procedure,  however. 

For  some  purposes,  the  method  outlined  above  will  not  effect  scattering 
with  sufficient  frequency,  for  an  available  scattering  law  at  a  site  has  roughly 
a  probability  of  1  /24  of  taking  place.  At  the  cost  of  slowing  the  procedure 
down,  a  straightforward  alternative  is  available  as  follows.  After  the  first 
choice  of  scattering  law  is  made  and  applied  if  possible,  and  before  the  particle 
motion  to  neighboring  sites,  a  second  independent  choice  is  made  of  binary 
scatter  and  applied  if  possible.  This  may  be  repeated  as  many  times  as 
necessary  before  motion.  An  attractive  feature  of  this  scheme  is  that  the 
number  of  repetitions  may  be  settable  by  the  program.  Whether  the  scheme 
is  any  better  than  running  the  simulation  faster  with  only  one  randomization 
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per  motion  depends  on  the  time  trade-off  for  scattering  steps  compared  to 
motion  steps. 

There  is  still  another  way  of  proceeding  which  in  a  sense  eliminates  the 
need  for  randomization.  One  may  simply  choose,  once  and  for  all,  at  each 
site  in  the  field  one  of  the  24  binary  scattering  laws,  the  choice  to  be  made 
as  randomly  as  possible.  This  is  probably  the  cheapest  and  fastest  way  to 
simulate,  though  it  might  be  useful  for  some  purposes  if  the  fixed  random 
allocation  of  scattering  laws  to  sites  could  be  easily  changed. 

So  much  for  the  first  lattice.  The  second  lattice  has  a  smaller  state  vector 
at  the  site,  but  is  considerably  poorer  in  scattering  laws.  With  the  restric¬ 
tion  imposed  by  the  “married  pairs,”  there  are  only  three  binary  scatters — 
interchange  any  of  the  following  three  pairs 

m  2  000.  ,0200.  .00  2  0. 

* 1 '  (  -2  0  0  0  *  0  2  0  0  (  0  0  -2  0 

none  of  which  mix  up  married  and  singles.  There  are  no  ternary  laws;  there 
are,  however,  24  special  quaternary  laws  of  which  a  general  example  is 

(1,1, 1,1)  +  (1,1, 1,-1)  +  (-1,-1, -1-1)  «- 

(2, 0,0,0)  +  (—2, 0,0,0)  +  (0,2,  0,0)  +(0,-2, 0,0) 

and  these  24  together  with  the  three  of  (I)  eliminate  unwanted  conservation. 

This  lattice  with  the  27  scattering  laws  above  can  be  used  along  any  of 
the  lines  described  for  the  first  lattice,  though  it  is  not  at  all  clear  what  the 
relative  frequency  of  scatters  coming  from  (I)  with  those  coming  from  (II) 
should  be.  In  general,  especially  in  instances  in  which  overall  density  is  fairly 
large  or  fairly  small,  it  is  going  to  be  difficult  to  use  (II)  and  its  ilk.  Since 
without  (II)  the  marrieds  and  singles  are  never  mixed,  it  seems  clear  that 
somewhat  greater  emphasis  must  be  put  on  finding  applications  of  (II). 

In  the  final  analysis,  the  performance  in  practice  of  either  lattice,  along 
any  of  the  lines  described  above,  must  be  studied  by  actual  computer  simula¬ 
tions.  A  great  deal  will  depend  on  the  size  of  the  region  over  which  averages 
are  taken  in  order  to  obtain  macroscopic  estimates,  on  details  of  the  scatter¬ 
ing  laws  used,  and  on  particle  density. 
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3  A  VLSI  CHIP  FOR  A  CELLULAR  AU¬ 
TOMATA  MACHINE 


3.1  Introduction 


The  4-D  scheme  discussed  in  Section  2  above  turns  out  to  be  very  complex. 
To  explain  a  VLSI  circuit  that  can  implement  it,  we  begin  by  explaining  a  2-D 
square-lattice  system  first.  The  scheme  conserves  particles  and  momentum. 
(However,  it  is  not  isotropic,  and  is  hence  unsuitable  for  simulating  real 
physics.)  The  collision  rules  are  a  function  of  the  incoming  particles  from 
the  North  (N),  South  (S),  East  (E),  and  West  (W)  directions,  a  random 
collision  bit  (C)  that  causes  a  collision  to  occur  if  it  is  possible  to  have  one, 
and  a  random  bit  ( R N),  but  we  will  include  this  function  for  possible  later 
use. 

The  most  straightforward  method  is  a  simple  table  look  up.  Table  3-1 
illustrates  this.  Given  a  set  of  particles,  a  collision  bit,  and  a  random  bit, 
the  output  particles  are  determined. 
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Table  3-1 


INPUT  NSEW 

c 

rn 

OUTPUT  NSEW 

0011 

0 

- 

0011 

1 

- 

- 

- 

0101 

- 

- 

0101 

0110 

- 

- 

0110 

0111 

- 

- 

0111 

1000 

- 

- 

1000 

1001 

- 

- 

1001 

1010 

- 

- 

1010 

1011 

- 

- 

1011 

1100 

0 

- 

1100 

1100 

1 

- 

0011 

1101 

- 

- 

1101 

1110 

- 

- 

1110 

mi 

- 

- 

mi 

Now  consider  a  ‘factored’  solution  to  the  same  problem.  Only  head-on  colli¬ 
sions  can  result  in  scattering  and  if  scattering  is  to  occur,  then  there  must  be 
an  output  channel  for  each  particle  to  occupy.  Thus  there  are  two  ‘scattering’ 
rules: 


\)N  SEW -  C-+E-W 
2 )  N -S -E-W -C  -+  N -S. 

The  first  rule  can  be  read  as 


“If  there  is  a  particle  from  the  North,  and  a  particle  from  the 
South, 

and  no  particle  from  the  East,  and  no  particle  from  the  West,  and 
a  collision  is  to  occur,  then  send  a  particle  out  to  the  East  and 
send  a  particle  out  to  the  West.” 
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The  enabling  condition  paxts  of  these  rules  are  simply  implemented  by 
‘and’  gates. 

Next  we  must  decide  which  rule  to  empioy  if  both  should  be  enabled. 
(Of  course,  in  our  example  this  can  never  actually  occur,  but  it  will  occur  in 
more  complex  systems.)  We  will  employ  a  priority  encoder  to  choose  a  rule, 
with  a  dynamic  priority  assignment  by  the  random  bit. 

Finally,  the  chosen  rule  will  select  which  output  channels  to  block  and 
which  output  channels  to  inject  particles  into. 

The  complete  circuit  is  illustrated  in  Figure  3-1.  It  is,  of  course,  more 
complex  than  a  simple  table  look-up  approach  for  this  very  simple  system. 
However,  it  illustrates  the  method  we  intend  to  use  to  implement  the  4-D 
cellular  automata  scheme. 

3.2  Computational  Cell  for  4-D  Rules 

The  4-D  case  has  a  total  set  of  24  incoming  particles  but  some  (6)  are 
paired,  so  only  18  bits  are  needed  to  specify  the  incoming  and  outgoing  parti¬ 
cles.  In  addition  there  are  66  rules,  so  that  seven  input  priority  specification 
bits  are  needed  to  randomize  rule  selection.  These  will  be  decoded  to  the 
one  in  66  priority  selection  bits.  The  ’C’  bit  (C)  suffices  to  determine  if  an 
allowed  collision  will  occur. 

The  straight-forward  table  look-up  scheme  will  no  longer  work  efficiently 
as  a  table  of  18  *2  *  *(18  +  7  + 1)  =  109  bits  is  needed.  So  we  will  employ  the 
‘factored’  approach  illustrated  above.  The  computational  node  to  accomplish 
this  is  shown  in  Figure  3-2. 

The  rule  enabling  conditions  will  be  calculated  in  the  ‘AND-plane’  of  a 
PLA  (programmed  logic  array).  Thus  we  input  the  collision  enabling  bit  and 
all  of  our  particle  signals,  as  well  as  the  logical  complement  of  each.  This 
is  some  38  total  input  signals.  The  result  is  66  rule  enable  requests  (R). 
Figure  3-3  illustrates  the  connections  for  the  ‘AND-plane.’  From  Figure  3-3, 
it  is  easy  to  count  that  there  are  282  transistors  needed  to  implement  the 
connections.  A  few  more  (~  100)  will  also  be  needed  to  function  as  inverters, 
drivers,  etc.,  for  a  total  of  about  500  transistors. 


RANDOM  PRIORITY  ASSIGNMENT 


COLLISION  RULES  PRIORITY  SELECTION  SWITCH  PARTICLES 


Figure  3-1.  "FACTORED''  Computation  Cell. 
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To  Neighbors 


Figure  3-2.  A  computational  node  for  the  VLSI  chip. 
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MARRIED 

SINGLES  PAIRS 


MARRIAGES  INTER-GROUP  OUTER-GROUP 

AND  HEAD-ON  MIXING  HEAD-ON 

DIVORCES  COLLISIONS  COLLISIONS  COLLISIONS 


1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  2829  3031  32  33 


Figure  3-3.  AND-pIane'  of  Rule  Enable  Signal  Generation  Circuit  (The  complement  of  each  input 
signal  appears  in  the  row  next  to  the  signal.) 
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The  next  step  is  to  select  (randomly)  a  rule  to  fire.  We  will  decode  the 
seven  random  input  bits  into  66  signals  to  control  the  ‘AND-gates’  of  the 
priority  circuit.  This  will  require  about  500  transistors. 

If  a  simple  extension  of  the  priority  circuit  of  Figure  3-1  is  employed,  it 
would  require  three  gates  and  two  inverters  per  stage,  or  about  sixteen  tran¬ 
sistors  per  stage.  This  would  be  about  1000  transistors  for  the  simple  priority 
circuit.  Such  a  priority  circuit  would  be  far  too  slow,  and  if  pipelined  would 
have  too  much  latency.  So  a  ‘carry-look-ahead’  scheme  will  be  employed.  It 
will  require  about  1200  transistors  total  if  full  look-ahead  is  employed  using 
a  gate  fan  of  about  4.  This  combined  with  modest  pipelining  would  require 
about  1500  transistors  total.  So  we  will  assume  this  number. 

The  output  switch  circuitry  w’ill  require  36  gates  or  about  150  transistors. 

Thus  we  see  in  summing  up  that  the  factored  circuitry  will  require  about 
3000  transistors  per  computation  node. 

If  one-half  the  chip  is  dedicated  to  computation  circuitry  and  one-half 
to  memory  (to  store  the  virtual  node  state)  then  about  8x8  (64)  real 
computation  nodes/chip  is  about  the  limit  that  r>e  expected  for  near- 

term  VLSI  technology  (see  Table  3-21 

Table  o-2 


VLSI  TECHNOLOGY  (CMOS) 


Today 

in  about  5  years 

~  cm2  active  area 

~  cm2 

~  200  pins 

~  400  pins 

~  1  Mbit  DRAM 

~  4  Mbit  DRAM 

~  50K  ‘Random’  transistors 

~  200K  Tran. 

~  10  nsec  internal  clock 

~  nsec 

~  80  nsec  external  drive 

~  nsec 

These  nodes  are  laid  out  as  illustrated  in  Figure  3-4. 

The  performance  of  the  proposed  ‘lattice  gas  computer’  is  now  easily 
estimated  (see  Table  3-3). 
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Figure  3-4.  Layout  of  nodes  on  ihe  VLSI  chip. 
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Table  3-3 


PERFORMANCE  ESTIMATES 


Present  Day 

Future 
(5  years) 

512  lattice  points 

64  nodes/chip 

106  chips 

32K  lattice  points/chip 

3  x  lO10  lattice  points  TOTAL 

RS  1011 

10  nsec  update  rate 

64  x  108  updates/chip/sec 

1016  updates/sec 

R*  1017 

About  106  VLSI  chips  is  the  limit  for  a  single  computer  system  (modern 
large  scale  supercomputers  have  about  3  x  105  chips).  At  512  lattice  points 
per  node,  64  nodes/chip  and  106  chips,  up  to  3  x  1010  lattice  points  may  be 
feasible.  With  a  10  nanosecond  update  rate,  64  x  108  updates  per  chip  and 
106  chips  yields  an  update  rate  of  ~  1016  per  second. 

Advances  in  VLSI  technology  over  the  next  five  years  will  allow  ss  3  times 
the  number  of  lattice  points  and  «  10  times  the  internal  clock  rate  to  yield 
rj  10n  lattice  points  updated  at  the  rate  of  «  1017  updates  per  second. 
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4  GENERAL  COMPARISON  OF  CELLU¬ 
LAR  AUTOMATA  AND  CONVENTIONAL 
FLUID-DYNAMICAL  METHODS 


Before  looking  in  detail  at  the  computational  requirements  for  cellular 
automata  and  conventional  fluid  dynamics,  we  first  attempt  to  summarize 
some  of  their  relative  advantages  and  disadvantages  in  a  more  general  way. 


4.1  Cellular  Automata:  Potential  Advantages 


Cellular  automata  have  the  nice  properties  of  elegance,  in  the  sense  that 
their  microphysics  is  immediately  transparent,  and  of  local  simplicity.  They 
are  more  immediately  suitable  for  parallelization  than  conventional  fluid 
methods,  and  they  lend  themselves  to  the  design  of  custom  computer  ar¬ 
chitectures  which  may  give  very  large  increases  in  speed  relative  to  conven¬ 
tional  multipurpose  machines.  In  addition,  because  of  the  “modular”  nature 
of  their  geometry,  they  may  be  able  to  handle  some  types  of  complicated 
boundary  conditions  more  readily  than  conventional  fluid  techniques. 


4.2  Cellular  Automata:  Disadvantages 

On  the  other  hand,  the  macroscopic  physics  which  is  being  described  by 
the  cellular  automata  model  is  not  always  clear.  This  is  the  case,  for  exam¬ 
ple,  for  flows  with  Mach  numbers  approaching  unity  or  for  lattices  lacking 
the  appropriate  isotropy  properties.  Additionally,  cellular  automata  cannot 
be  used  for  hypersonic  flows,  and  they  scale  more  poorly  to  high  Reynolds 
numbers  than  conventional  methods,  as  shown  in  Reference  2. 

C.A.  calculations  suffer  with  respect  to  Navier-Stokes  equations  because 
of  the  need  to  calculate  much  more  than  required  by  the  application  to  fluid 
flow.  Cellular  automata  provide  a  “few  state”  representation  of  configuration 
and  velocity  space,  and  are  basically  very  simple  versions  of  kinetic  theory. 
Thus  one  must  average  over  large  numbers  of  individual  sites  in  ~  space  to 
construct  the  macroscopic  density,  n  (  ~  f),  and  mean  velocities,  ~  (~  f), 
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which  enter  the  fluid  equations  directly.  This  problem  is  shared  by  all  kinetic 
theories,  of  course.  Indeed,  the  point  of  solving  macroscopic  equations  such 
as  those  of  Navier-Stokes  is  to  first  filter  out  the  microscopic  spatial  and 
temporal  scales,  and  then  solve  for  n  and  ~. 


4.3  Conventional  Methods:  Advantages 


For  conventional  solutions  of  the  Navier-Stokes  equations,  the  physical 
assumptions  are  more  immediately  apparent.  The  physical  model  is  also  more 
flexible,  since  for  example  additional  terms  may  be  added  or  the  magnitude 
of  the  viscosity  may  be  changed  without  altering  the  underlying  spatial  or 
grid  structure.  Conventional  fluid  dynamics  seems  better  able  to  deal  with 
a  large  dynamic  range  in  spatial  or  velocity  coordinates.  This  is  partially 
because  of  the  more  favorable  scaling  to  high  Reynolds  number  discussed 
above,  and  partially  due  to  the  ability  to  use  adaptive  grid  techniques  in 
those  locations  where  high  spatial  resolution  is  needed,  without  having  to  use 
the  fine  grid  spacing  in  those  locations  where  nothing  much  is  going  on.  They 
also  have  the  potential  advantage  that  the  assumption  of  incompressibility 
can  be  incorporated  explicitly  in  the  algorithm  if  the  flow  is  highly  subsonic, 
leading  to  a  substantial  savings  in  computer  time.  By  contrast  there  is  no 
comparable  technique  for  taking  advantage  of  incompressibility  for  cellular 
automata. 

Again  this  is  because  C.A.  share  with  kinetic  theories  or  molecular  dy¬ 
namics  the  feature  of  calculating  all  time  scales  at  once. 


4.4  Conventional  Methods:  Disadvantages 

Conventional  fluid  techniques  have  more  complexity  and  more  bits  per 
cell  than  cellular  automata.  Per  cell,  they  thus  have  higher  memory  require¬ 
ments.  The  floating-point  operations  required  for  finite-difference  solutions 
of  the  Navier-Stokes  equations  take  much  longer  to  perform,  per  operation, 
than  the  simple  logical  operations  or  table  look-ups  of  the  cellular  automata 
case.  Conventional  fluid  techniques  are  not  as  easy  to  parallelize  as  cellular 
automata,  and  they  may  in  some  cases  be  less  able  to  deal  with  complex 
boundary  conditions. 
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4.5  Comparison  on  the  Basis  of  Computational  Work 


Here  we  attempt  to  quantify  some  of  the  explicitly  computational  pros 
and  cons  of  the  two  methods.  Consider  solving  the  same  fluid-dynamics 
problem  two  ways:  using  cellular  automata  and  using  conventional  methods 
based  on  finite-difference  solution  of  the  Navier-Stokes  partial  differential 
equations.  We  compare  the  computational  work  needed  in  the  two  methods. 
We  emphasize  3-D  applications,  since  in  our  judgment  these  represent  the 
next  major  step  in  computational  fluid  dynamics  over  the  coming  decade. 

When  the  specifics  of  computer  architecture  enter  our  discussion,  we  shall 
compare  a  conventional  Navier-Stokes  fluid  calculation  performed  on  a  Cray- 
2  class  supercomputer  with  a  cellular  automata  calculation  performed  on  a 
special-purpose  massively  parallel  machine  that  does  not  exist  today.  Some 
details  of  this  hypothetical  special-purpose  machine  were  described  in  Section 
3;  others  will  emerge  as  the  discussion  proceeds. 

Both  numerical  techniques  involve  subdividing  the  fluid  volume  of  interest 
into  a  discrete  grid  or  lattice.  First  we  discuss  how  many  grid  or  lattice  points 
are  needed  for  the  two  approaches  (Reference  2). 


4.5.1  Number  of  Grid  or  Lattice  Points 

Let  L  be  a  macroscopic  scale  length  characterizing  the  fluid-dynamics 
problem,  and  let  U  be  a  macroscopic  velocity.  For  example  L  might  be  the 
length  of  an  obstruction  in  the  flow,  the  width  of  an  shear  layer  in  a  jet,  or 
the  thickness  of  a  channel  or  pipe  through  which  the  fulid  is  flowing.  The  cell 
size  for  both  the  conventional  and  the  cellular  automata  calculations  must  be 
clearly  be  much  smaller  than  L,  so  as  to  resolve  the  macroscopic  structure. 

Figure  4-1  depicts  the  relations  between  the  macroscopic  scale  L ,  the  size 
of  a  cell  lc  in  the  fluid  code,  and  the  lattice  spacing  a  in  the  cellular  automata 
model.  Because  the  equivalent  of  fluid  quantities  must  be  determined  in  the 
cellular  automata  model  by  averaging  over  the  noisy  data  of  many  lattice 
points,  it  is  clear  that  L  »  tc  »  a. 
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In  a  conventional  hydrodynamics  calculation,  the  viscosity  determines 
the  required  size  of  the  cell,  because  a  cell  should  resolve  the  Kolmogorov 
dissipation  scale  ry. 

(ckt)  =  {L  v3/U3)1'4  =  L  Re~3/ 4,  (4-1) 

where  v  is  the  viscosity  and  Re  is  the  Reynolds  number, 

Re  =  U  Lfv.  (4  -  2) 

Thus  from  (4-1),  the  number  of  fluid  cells  in  a  length  L  is 

L/n  ss  Re314.  (4  -  3) 

This  quantity,  Re3/4,  represents  the  dynamic  range  in  scale  sizes  which  the 
calculation  must  resolve,  since  in  any  case  the  finite  viscosity  will  not  permit 
shear  flow  to  develop  on  scales  smaller  than  rj.  The  dynamic  range  in 
velocities  required  in  the  calculation  is  of  order  Re1/4. 

From  Equation  (4-3)  we  see  that  in  the  conventional  fluid-dynamic  cal¬ 
culation,  the  number  of  cells  required  in  three  dimensions  is  approximately 

JVcell(3D  fluid)  =  {L/vf  =  Re*/4.  (4  -  4) 

It  is  well  known  that  this  strong  scaling  of  the  number  of  cells  with  Reynolds 
number  makes  it  difficult  to  carry  out  3-D  fluid-dynamics  calculations  even 
at  Reynolds  numbers  of  a  few  hundred.  For  example  if  one  used  a  three 
dimensional  grid  with  64  grid  points  on  a  side  (2.6  x  105  total  grid  points  in 
3-D),  one  could  by  the  above  criterion  do  a  good  job  with  a  Navier-Stokes 
fluid  algorithm  simulating  Reynolds  numbers  up  to  about  (64  )4/3  =  256.  To 
simulate  a  Reynolds  number  of  1000  would  require  178  grid  points  on  a  side, 
or  5.6  x  106  fluid  cells  in  all. 

For  the  celluar  automata  case,  there  are  several  different  criteria  which 
must  be  met  in  order  to  produce  a  physically  meaningful  simulation  of  a  fluid. 
These  are  reviewed  in  Reference  2.  For  our  purposes  the  most  stringent 
condition  turns  out  to  concern  the  way  in  which  collisions  in  the  celluar 
automata  model  must  represent  the  viscosity  of  the  fluid,  when  averaged 
over  many  cellular  automata  nodes  or  lattice  points. 

In  the  cellular  automata  model,  all  particles  move  with  one  discrete  ve¬ 
locity  (or  at  most  a  few).  We  shall  call  this  velocity  v If  the  mean  free 
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path  of  a  cellular  automata  particles  is  A,  then  the  effective  viscosity  v  which 
the  model  will  produce  when  averaged  over  distances  much  longer  than  the 
mean  free  path  is 

(4  -  5) 

The  relation  between  the  lattice  spacing  a  and  the  mean  free  path  A  varies 
with  the  collision  rules  chosen  for  the  cellular  automata.  In  some  models  the 
mean  free  path  will  be  considerably  larger  than  a,  if  the  rules  state  that  a 
scattering  event  can  only  occur  when  the  lattice  sites  which  will  be  the  final 
states  of  the  collision  are  initially  unoccupied.  In  other  cases  the  mean  free 
path  will  be  comparable  with,  or  even  in  some  cases  less  than,  a;  this  may 
be  the  situation  when  rules  state  that  there  can  be  more  than  one  cellular 
automata  particle  occupying  a  given  site. 

In  the  discussion  to  follow,  we  shall  assume  that  the  mean  free  path  is 
approximately  comparable  with  the  lattice  spacing,  a.  When  one  averages 
over  distances  long  compared  to  the  mean  free  path,  the  macroscopic  viscosity 
v  will  be  roughly 

v  =  A  v,  =  a  vs.  (4—6) 

Dividing  this  inequality  by  a  and  multiplying  by  the  macroscopic  scale  length 
L,  this  implies  that  the  number  of  lattice  sites  in  a  macroscopic  scale  length 
L  is 

L/c  a  L  v./v  =  (—)(£)  =  Rc/M,  (4  -  7) 

v  U 

where  M  is  the  Mach  number, 

M  =  U/v,.  (4  -  8) 

Equation  (4-7)  is  the  cellular  automata  counterpart  of  Equation  (4-3)  for  the 
conventional  fluid-dynamics  case.  The  Reynolds  number  Re  is  considerably 
larger  than  unity  in  most  cases  of  interest.  Additionally,  the  Mach  number 
M  must  be  small  because  it  is  shown  in  Section  6  of  this  report  that  the 
cellular  automata  model  does  not  reproduce  Navier-Stokes  fluid  dynamics 
unless  M  «  1.  As  a  consequence,  the  scaling  for  cellular  automata  given 
in  Equation  (4-7)  is  even  more  stringent  than  that  for  conventional  fluid 
calculations  given  in  Equation  (4-3).  The  number  of  lattice  points  required 
in  three  dimensions  is  approximately 

Nce]i  (3D  C.A.)  Si  (L/af  Si  (Re/M)3.  (4  -  9) 
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To  continue  our  numerical  example,  using  Re  =  256  and  Mach  number  M 
=  0.2  the  number  of  cells  required  is  2.1  x  109.  A  Reynolds  number  of  1000 
would  require  1.3  x  1011  cells  for  a  Mach  number  of  0.2. 

Figure  4-2  illustrates  the  number  of  cells  required  for  the  equivalent 
Navier-Stokes  and  cellular  automata  calculations,  at  several  Mach  numbers. 


In  three  dimensions,  the  ratio  of  the  number  of  grid  points  needed  for 
cellular  automata  and  conventional  fluid-dynamical  calculations  of  the  same 
problem  is 


(vhf  = 


L/a]3 


»  1. 


(4-10) 


L/t)\  A/3 

Consider  a  numerical  example  just  discussed,  where  the  fluid  dynamics  cal¬ 
culation  has  643  grid  points.  The  cellular  automata  model  needs  a  factor 
of 


Re3/4/M 3  =  8000 


more  grid  points  than  the  fluid  dynamics  calculation  would  need.  For  a 
Reynolds  number  of  1000,  the  cellular  automata  model  would  need  a  factor 
of  2.2  x  104  more  grid  points  than  the  fluid  model. 


4.5.2  Number  of  Bits  per  Lattice  or  Grid  Point 

The  computational  effort  needed  to  solve  a  given  problem  depends  on  a 
number  of  factors  besides  the  number  of  lattice  or  grid  points.  In  the  next 
few  subsections  we  consider  these  in  turn.  First,  we  discuss  the  number 
of  bits  needed  at  each  grid  or  lattice  position,  to  describe  the  state  of  the 
fluid  or  cellular  automata  model.  This  is  ultimately  related  to  the  memory 
requirements.  In  general  cellular  automata  models  will  require  considerably 
less  memory  per  cell,  but  will  have  many  more  cells  than  the  conventional 
models. 

For  the  simplest  triangular-lattice  cellular  automata  models  in  two  di¬ 
mensions,  the  magnitude  of  a  particle’s  velocity  is  always  v,,  and  after  a 
scattering  event  each  particle  goes  in  one  of  6  discrete  directions.  Thus  in 
the  simplest  case  there  are  6  bits  per  node  to  keep  track  of. 

However  the  situation  is  more  complex  in  three  dimensions,  since  the 
required  lattices  may  not  have  simple  symmetries  (see  Sections  2  and  5  of 
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Number  Of  Cells 


Reynolds  Number 


Figure  4-2.  Number  of  fluid  cells,  NCfu  (3D  fluid),  required  in  three  dimensional  Navier-Stokes 
computation,  compared  with  number  of  cellular  automata  lattice  points,  Nc*u  (C.  A  ) 
required  to  perform  the  equivalent  calculation.  Memory  limits  are  derived  in 
Sections  4.5.2  and  4  5.3.  they  correspond  in  the  cellular  automata  case  to  a  106-chip 
parallel  supercomputer  with  several  megabits  of  local  memory  per  node,  and  in  the  fluid 
case  to  a  Cray-2  with  64  million  words  of  shared  memory. 
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the  present  report).  The  number  of  bits  required  per  node  ranges  from  14  at 
the  low  end  to  30  or  40  at  the  high  end,  depending  on  the  lattice  geometry 
and  collision  rules  chosen.  For  the  purposes  of  this  discussion  we  will  estimate 
that  about  20  bits  per  node  are  required  in  three  dimensions. 

Next  we  consider  the  number  of  bits  required  at  each  cell  or  grid  point  in 
a  conventional  fluid-dynamics  calculation,  recalling  that  each  fluid- dynamic 
variable  is  now  typically  described  by  a  64-bit  word.  Since  the  Mach  number 
for  cellular  automata  models  must  be  small,  it  is  appropriate  to  compare 
these  models  with  the  computational  requirements  of  incompressible  hydro¬ 
dynamics.  In  this  case  the  only  independent  variables  are  two  components  of 
the  velocity,  since  the  third  component  is  determined  from  div  ~  =  0.  The 
pressure  is  found  from  solving  a  Poisson  equation  of  the  form 


P  =  Q{v. 


xi  vyi 


Vz). 


(4-11) 


However  computationally  one  must  in  fact  utilize  more  than  just  two  64-bit 
words.  For  example  since  one  must  find  the  pressure  by  solving  Equation 
(4-11),  the  pressure  and  all  three  velocity  components  must  be  known  or  cal¬ 
culated  at  each  cell  or  grid  point.  A  previous  JASON  report  (Reference  3) 
analyzed  one  particular  3-D  finite-difference  technique  for  solving  Equation 
(4-11),  and  found  that  of  order  10  words  per  cell  were  needed,  or  approxi¬ 
mately  640  bits  (if  the  words  were  64  bits  each). 

Thus  very  roughly,  the  ratio  of  the  number  of  bits  per  cell  or  node  required 
for  cellular  automata  and  fluid  calculations  is 


Nbits(Cellular  Automata)  ^  20  2 

Nbits(Flu*d  Dynamics)  640  X 


(4-12) 


As  anticipated,  there  are  considerably  fewer  bits  per  node  required  for  the 
cellular  automata  case.  However  the  total  memory  requirements  for  cellular 
automata  may  still  be  more  than  for  the  fluid-dynamics  calculation,  since 
the  number  of  cells  is  larger: 

Total  Memory  (Cellular  Automata)  „  (3.1  x  10-2)Re3/4 
Total  Memory  (Fluid  Dynamics)  A/3  ^  ^ 

For  our  previous  numerical  examples,  Re  =  256,  M  =  0.2,  the  cellular  au¬ 
tomata  model  requires  250  times  more  total  memory  than  the  fluid  dynamics 
calculation,  but  only  l/32nd  the  amount  of  memory  per  node.  A  calculation 
at  a  Reynolds  number  of  1000  require  690  times  more  total  memory  for  a 
cellular  automata  model  than  for  a  fluid  one. 
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4.5.3  Number  and  Speed  of  Operations  per  Timestep 


Another  way  to  compare  cellular  automata  techniques  with  conventional 
fluid  dynamics  is  to  look  at  the  required  number  of  operations  per  timestep, 
and  the  speed  with  which  they  can  be  executed. 


First  we  consider  the  number  of  operations  required  by  conventional  fluid 
dynamics.  These  will  of  course  vary  with  the  choice  of  algorithm.  Here  we 
choose  one  example,  which  we  hope  is  typical.  Reference  3  studied  one  3-D 
incompressible  fluid  dynamics  algorithm  in  detail,  and  found  the  following 
number  of  total  operations  per  time  step,  for  a  computational  grid  of  Nx,  Ny, 
and  Nz  zones  in  the  x,y,  and  z  directions: 

Additions: 

NxNyNz{ 95  +  3  log2  NxNy) 


Continuing  our  numerical  example,  if  Nx  —  Ny  =  Nz  =  64,  corresponding 
to  a  Reynolds  number  of  256,  there  would  be  3.4  xlO7  additions,  2.8  x  107 
multiplications,  and  7.3  x  106  memory  transfers  per  timestep  for  the  fluid 
calculation.  Per  fluid  cell,  there  would  be  131  additions,  105  multiplications, 
and  28  memory  transfers  per  timestep.  Of  course  each  of  the  additions  and 
multiplications  is  a  floating-point  operation. 


Conventional  fluid  calculations  running  on  a  Cray-2  computer  in  vector 
mode,  with  pipelining  of  memory  fetches,  would  require  a  clock  cycle  of  about 
4  nsec  for  each  of  the  required  adds,  multiplies,  and  memory  fetches. 


For  a  3-D  cellular  automata  model,  the  number  of  required  operations 
per  timestep  is  considerably  more  uncertain,  since  it  depends  on  the  partic¬ 
ular  choice  of  lattice  and  collision  rules,  neither  of  which  has  yet  been  well 
explored.  Here  we  consider  for  the  sake  of  definiteness  the  “4-D”  lattice  de¬ 
scribed  in  Section  2  of  this  report.  For  this  lattice,  18  bits  at  each  node  are 
required  to  describe  the  “state  vector”  of  the  system. 


First  we  ask  how  complex  the  collision  rules  are,  and  in  particular  how 
much  space  it  would  take  to  realize  them  on  a  VLSI  chip. 
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The  rule  function  accepts  18  bits  to  represent  incoming  particles,  utilizes 
2  to  4  random  bits,  and  produces  18  bits  to  represent  the  outgoing  particles 
after  each  collision.  A  straightforward  table  look-up  for  the  collision  rules 
would  thus  require  a  table  of  about  106  entries.  Symmetry,  however,  greatly 
reduces  the  functional  complexity.  A  rough  estimate  based  on  collision  rules 
designed  by  Rothaus  (Section  2  of  the  present  report)  indicates  that  there  are 
24  “parallel  collision  sites”  per  lattice  point.  If  the  rule  functions  were  to  be 
hard-wired,  as  in  a  special-purpose  computer  devoted  exclusively  to  cellular 
automata,  each  "parallel  collision  site”  would  require  about  8  inputs  and  4 
outputs  or  about  500  transistors  each  in  a  PLA  (Programmed  Logic  Array). 
Thus  the  24  “parallel  collision  sites”  needed  for  each  rule  function  would 
require  about  104  transistors  or  1%  of  the  area  of  a  VLSI  chip,  exclusive  of 
wiring.  There  could  therefore  be  about  102  collision  rule  calculation  engines 
per  chip,  and  the  collision  rule  calculation  could  be  parallelized  by  a  factor 
of  up  to  100. 

The  limiting  number  of  lattice  points  per  chip  is  easily  calculated  by 
assuming  that  approximately  20  bits  are  needed  to  represent  one  lattice  point. 
Random  access  memory  chips  can  presently  contain  up  to  4  x  106  bits.  So 
up  to  200,000  lattice  points  can  reside  in  a  single  chip.  Since  today  about 
106  chips  can  be  practically  assembled  into  a  super  computer-scale  system, 
the  number  of  cellular  automata  lattice  points  is  limited  to  about  2  x  1011. 

A  given  chip  will  have  a  fixed  area,  so  there  is  a  design  tradeoff  betwen 
the  number  of  lattice  point  states  that  can  be  put  on  a  chip.  If  equal  chip 
area  were  allocated  to  each  of  these  functions,  we  would  have  105  lattice 
point  states  per  chip  (or  10u  total  lattice  points  in  our  hypothetical  106-chip 
supercomputer),  and  50  parallel  lattice  point  engines  per  chip.  According  to 
Equation  (4-9),  the  Reynolds  number  accessible  to  cellular  automata  calcu¬ 
lations  on  this  special-purpose  computer  would  then  be  less  than  or  equal 
to  1000  at  a  Mach  number  of  0.2,  and  less  than  or  equal  to  300  at  a  Mach 
number  of  0.05.  To  access  Reynolds  numbers  larger  than  these  values,  lattice 
points  could  be  stored  on  disc  rather  than  in  local  memory,  and  then  shut¬ 
tled  in  and  out  of  the  computational  nodes  as  needed.  This  use  of  "virtual 
memory”  would  be  very  costly  in  execution  time,  however. 

In  principle  one  could  also  increase  the  accessible  Reynolds  numbers  some¬ 
what  by  allocating  a  larger  fraction  of  the  chip  area  to  lattice  points,  at  the 
expense  of  parallelism  in  the  rule  engines.  But  since  the  accessible  Reynolds 
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number  only  increases  as  the  1/3  power  of  the  number  of  lattice  points,  there 
would  not  be  a  great  deal  to  be  gained  by  allocating  more  than  50%  of  the 
chip  area  to  lattice  points. 

This  discussion  leads  us  to  a  preliminary  view  of  the  required  charac¬ 
teristics  for  the  special-purpose  cellular  automata  computer  which  we  are 
postulating.  The  desire  to  simulate  fluid  behavior  in  three  spatial  dimen¬ 
sions  and  at  large  Reynolds  numbers  requires  a  great  many  lattice  points, 
and  implies  a  massively  parallel  architecture  which  nevertheless  has  a  great 
many  (of  order  105)  lattice  points  at  each  computational  node  or  chip.  This 
is  in  contrast  to  the  prevalent  practice  to  date,  in  which  for  two  spatial  di¬ 
mensions  and  moderate  Reynolds  numbers  one  can  devote  one  computational 
node  to  a  single  lattice  point.  The  cost  of  placing  many  lattice  points  at  each 
computational  node  is  some  increased  complexity,  due  to  the  requirement 
to  communicate  to  neighboring  lattice  points  which  are  sometimes  off-chip 
and  sometimes  on-chip.  However,  with  appropriate  lattice  layout  the  bulk 
of  these  nearest-neighbor  communications  will  occur  within  a  single  chip, 
and  thus  the  considerably  slower  off-chip  communication  times  will  not  be 
as  much  of  a  penalty  as  they  are  today. 

A  second  contrast  between  this  type  of  computer  and  today’s  parallel 
machines  is  in  the  sophistication  and  cost  of  each  computational  node.  Today, 
devices  such  as  the  “connection  machine”  endeavor  to  use  computational 
nodes  which  are  relatively  simple,  inexpensive,  and  which  have  only  a  modest 
amount  of  local  memory.  The  computer  which  we  have  envisioned  here  would 
require  several  megabits  of  local  memory  at  each  computational  node,  and 
would  in  addition  use  custom  designed  VLSI  rule  engines  at  each  node  to 
achieve  adequate  parallelism.  These  requirements  are  due  to  the  desire  to 
perform  3-D  simulations  at  Reynolds  numbers  of  102  -  103. 


4.5.4  Number  of  Timesteps 

The  different  grid  sizes  and  algorithms  will  impose  different  requirements 
on  the  number  of  timesteps  required  to  perform  the  same  physical  calculation 
using  cellular  automata  or  conventional  fluid-dynamics  models. 
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For  cellular  automata,  there  is  a  genuine  Courant  condition  since  “sound 
waves”  can  propagate  at  the  velocity  v4.  Thus  the  timestep  is  limited  to  the 
time  it  takes  a  sound  wave  to  cross  from  one  lattice  point  to  another: 

A  tca  =  a/v,.  (4  -  15) 

From  Equation  (4-7),  this  is  equivalent  to 

A  tca  =  (L/U)(M2  /  Re)  (4-16) 


For  incompressible  fluid  dynamics,  there  is  not  a  Courant  condition  be¬ 
cause  sound  waves  effectively  move  instantaneously  around  the  grid.  Rather, 
the  timestep  is  determined  by  the  time  for  the  smallest  scale  eddies  to  evolve. 
Since  the  smallest  spatial  scale  is  the  grid  spacing  rj  =  L  Re~3^4  and  the 
smallest  velocity  is  the  corresponding  eddy  velocity  vn  3?  U  Re-1/4,  the 
evolution  time  for  the  smallest  eddies  is  tj/v v.  Thus  the  timestep  is  approxi¬ 
mately 

A'fluid  S  (I/tf)  tfe-1'2.  (4  -  17) 

The  ratio  of  the  minimum  timestep  size  for  the  cellular  automata  and  fluid 
dynamics  models  is 


A  tcJ Atfluid  a  M2  Re~l/2.  (4  -  18) 

Continuing  with  our  numerical  example  of  Re  =  256  and  M  =  0.2,  this  would 
imply  that  the  cellular  automata  model  would  need  Re1^2 /M2= 400  times 
more  timesteps  to  calculate  the  same  physical  problem  as  the  fluid  dynamics 
model.  For  a  Reynolds  number  of  1000,  the  cellular  automata  calculation 
would  require  about  790  times  more  timesteps  than  the  equivalent  Navier- 
Stokes  calculation. 


4.5.5  Ease  of  Adapting  to  a  Parallel  Computation  Environment 


A  final  factor  influencing  the  relative  promise  of  the  cellular  automata 
and  conventional  fluid  dynamics  approaches  is  the  ease  with  which  each  can 
be  adapted  to  a  computational  environment  which  is  highly  parallel.  By  de¬ 
sign,  the  cellular  automata  algorithms  are  well  suited  to  massively  parallel 
computer  architectures;  one  can  assign  a  node  or  group  of  nodes  to  a  specific 
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microprocessor  and  its  associated  memory.  Since  we  have  seen  that  the  mem¬ 
ory  requirements  per  lattice  point  or  cell  are  much  less  stringent  for  cellular 
automata  than  for  conventional  techniques,  and  since  only  logical  operations 
are  required,  each  microprocessor  can  be  relatively  unsophisticated.  In  ad¬ 
dition,  the  collision  rules  are  designed  so  that  each  lattice  point  need  only 
communicate  with  its  nearest  neighbors.  Thus  the  communications  overhead 
is  not  severe,  provided  that  lattice  points  which  are  logically  adjacent  are 
also  connected  by  short  physical  paths  (see  Section  7  for  a  more  complete 
discussion  of  the  latter  issue). 

In  the  discussion  of  cellular  automata  which  follows,  we  shall  continue  to 
hypothesize  a  special-purpose  parallel  machine  which  does  not  exist  today 
but  which  we  believe  to  be  within  today’s  state-of-the-art:  a  computer  of 
order  106  computational  nodes,  a  1  nsec  clock,  and  about  4  megabits  of  fast 
local  memory  per  node.  Thus  before  any  additional  parallelization  at  the 
chip  level,  one  gains  a  factor  of  106  over  a  one-processor  machine. 

If  at  the  chip  level  one  uses  the  type  of  hard-wired  rule  engine  which  we 
discussed  in  Section  4.5.3,  there  is  an  additional  gain  of  a  factor  of  50  due  to 
the  ability  to  compute  collision  rules  for  50  lattice  points  at  once. 

Conventional  Navier-Stokes  hydrodynamics  can  also  be  adapted  for  par- 
alel  computing,  although  the  individual  processors  must  be  considerably  more 
powerful  than  those  needed  for  cellular  automata.  Typical  requirements  are 
32-  or  64-bit  words  and  fast  floating-point  arithmetic.  Many  algorithms  also 
need  substantial  shared  memory  accessible  by  all  the  processors. 

For  the  purpose  of  the  present  discussion,  we  contrast  massively  parallel 
cellular  automata  calculations  with  Navier-Stokes  hydrodynamics  computed 
on  a  4-processor  Cray-2  scale  machine.  Experience  to  date  suggests  that 
multiprocessing  on  such  a  machine  can  yield  speed-ups  in  elapsed  time  of 
about  a  factor  of  3.6,  relative  to  the  same  calculation  performed  on  one 
processor  alone. 


4.5.6  Overall  Comparison  of  Computational  Effort 

One  can  attempt  to  combine  the  criteria  developed  in  the  preceeding 
sections  into  an  overall  figure  of  merit  for  the  cellular  automata  approach, 
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relative  to  conventional  fluid  dynamical  methods.  This  figure  of  merit  is 
in  some  ways  a  narrow  one,  in  the  sense  that  it  does  not  take  into  account 
practical  issues  such  as  mesh  tangling,  numerical  stability,  accuracy  and  noise 
properties,  flexibility,  and  so  forth.  What  our  overall  figure  of  merit  does 
measure  is  the  relative  amount  of  elapsed  computer  time  needed  to  perform 
the  same  physical  calculation,  using  the  two  methods. 

What  we  describe  as  the  computational  effort  is  an  estimate  of  the  overall 
computer  time  needed  to  complete  a  fluid-dynamics  calculation  of  physical 
duration  L/U,  the  timescale  for  macroscopic  evolution  of  the  flow.  Of  course 
most  actual  computations  will  be  carried  out  for  physical  durations  longer 
than  this.  But  since  the  number  of  L/U  times  required  will  vary  with  the 
specific  problem  being  solved,  we  shall  use  L/U  as  a  convenient  scaling  pa¬ 
rameter. 

The  computational  effort  thus  defined  is  made  up  of  a  series  of  factors: 

Effort  =  Arcejj  x  Ntimesteps  x  (parallelization  speed-up)-1  x 
£ 

operations  [operation  speed  x  -^0perajj0ns/  cell/timestep].  (4-19) 

The  comparison  we  shall  present  is  unfair  in  one  important  way:  it  com¬ 
pares  an  as  yet  unbuilt  special-purpose  parallel  supercomputer  for  cellular 
automata  calculations  with  a  four  or  five  year-old  Cray-2  machine  for  con¬ 
ventional  hydrodynamics.  In  a  sense  one  is  comparing  computers  which  are 
a  generation  apart,  and  the  reader  should  be  cautioned  that  this  inserts  a 
bias  favoring  cellular  automata  algorithms  on  massively  parallel  machines. 

Table  4-1  summarizes  the  properties  needed  for  the  evaluation  of  Equation 
(4-19).  We  have  assumed  that  Cray-2  memory  fetches  are  pipelined  and  are 
thus  executable  in  one  clock  cycle,  and  that  adds  and  multiplies  are  in  vector 
mode. 

Inserting  the  values  from  Table  4-1  into  Equation  (4-19),  we  obtain  for 
the  Navier-Stokes  calculation  on  a  Cray-2  class  machine: 

Effort  (Navier-Stokes)  =  10-7  Re11^4  (1  +  O.161og10  Re)sec,  (4  —  20) 

where  we  have  assumed  that  the  adds  give  the  dominant  timing.  For  the 
special-purpose  cellular  automata  computer,  the  effort  is  approximately 

Effort  (C.A.)  “  2  x  10-,7^e4M-5sec.  (4-21) 
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Table  4-1 


Comparison  of  Cellular  Automata  with  Incompressible 
Fluid  Algorithmns  in  Three  Dimensions 


Property 

Cellular  Automata 
Hypothetical 
Parallel  Computer 

Imcompressible 

Fluid, 

Cray-2  Computer 

^cell 

(Re/M3) 

Re9/4 

^timesteps 

Re/M2 

Re1/2 

Speedup  due  to 

106  chips  x  50 

parallelization 

rule  engines/chip 

3.6 

=  5  x  107 

N 

1  operations 

Rule  engine:  1 

Memory  fetch:  28 

per  cell 

Add:  95  +  4.5  log2Re 

pet  timestop 

Multiply:  81  +  3  log2Re 

Operation 

1  nsec 

4  nsec 

speed 
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The  resulting  effort  values  are  tabluated  in  Table  4-2  for  several  Reynolds 
numbers  and  two  Mach  numbers,  assuming  that  the  cellular  automata  rule- 
engine  parallelism  is  a  factor  of  50.  The  parentheses  for  higher  Reynolds 
numbers  and  low  Mach  numbers  indicate  that  the  cellular  automata  calcu¬ 
lation  would  not  fit  on  a  special-purpose  computer  with  106  total  chips,  if 
the  size  of  a  typical  chip  is  limited  to  4  megabits.  Similarly,  for  the  fluid 
case  the  parentheses  indicate  that  the  calculation  would  not  fit  in  the  64 
million  word  memory  of  a  Cray-2.  Under  both  of  these  circumstances,  vir¬ 
tual  disc-based  memory  could  be  utilized,  but  at  a  substantial  degradation 
in  execution  speed. 

Figure  4-3  illustrates  the  relation  between  computational  effort  and  Reynolds 
number  graphically.  One  sees  that  for  a  Mach  number  of  0.2  and  Reynolds 
numbers  between  100  and  1000,  the  cellular  automata  model  is  predicted  to 
execute  about  three  orders  of  magnitude  faster  than  the  conventional  fluid 
calculation.  To  be  sure,  we  have  made  several  quite  idealized  assumptions 
concerning  the  charactersitics  of  our  hypothetical  cellular  automata  com¬ 
puter.  But  even  if  one  therefore  gives  the  fluid  calculation  credit  for  an 
additional  factor  of  10  in  speed,  due  for  example  to  an  improved  next  gen¬ 
eration  of  general-purpose  supercomputers,  the  cellular  automata  approach 
appears  to  be  quite  promising. 

The  same  cannot  be  said,  however,  for  the  case  of  Mach  number  equal 
to  0.05.  Here  the  two  approaches  show  much  more  comparable  execution 
speeds,  and  the  cellular  automata  technique  requires  so  much  more  memory 
that  it  cannot  plausibly  fit  in  a  106-chip  parallel  supercomputer  for  Reynolds 
numbers  larger  than  about  300.  Furthermore,  if  as  before  one  gives  the 
conventional  fluid  approach  credit  for  an  additional  factor  of  10  in  speed, 
the  fluid  algorithms  emerge  as  somewhat  superior.  The  cellular  automata 
approach  retains  interest  primarily  in  those  situations  where  it  has  a  unique 
advantage;  this  might  be  the  case,  for  example,  in  studies  of  fluid  behavior 
within  a  boundary  layer,  where  the  Reynolds  numbers  are  not  large  and  the 
boundary  properties  may  be  complex. 

Thus  our  conclusions  regarding  the  relative  execution  speeds  of  the  two 
approaches  depend  on  the  range  of  Mach  numbers  in  which  one  is  interested. 

If,  as  suggested  in  the  discussion  of  the  Chapman-Enskog  expansion  in  Sec¬ 
tion  6,  one  must  limit  Mach  numbers  to  very  small  values  in  order  to  assure 
that  the  cellular  automata  model  reduces  to  the  Navier-Stokes  equations, 
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Table  4-2 


Computational  Effort  for  Navier-Stokes  Fluid  and 
Cellular  Automata  (C.A.)  Methods  in  Three  Dimensions 


Reynolds 

Number 

Fluid  Effort: 
Cray-2 
(sec) 

C.A.  Effort: 
Special-Purpose 
Parallel 

Computer  (sec) 

Ratio, 
Fluid 
to  C.A. 
Effort 

100 

4.2  x  10"2 

6.3  x  10~6 

6.7  x  103 

256 

0.58 

2.7  x  10~4 

2.2  x  103 

1000 

26 

6.3  x  10~2 

4.2  x  102 

5000 

(2.4  x  103) 

(39) 

(60) 

4  2a.  Mach  number  equal  to  0.2 


Reynolds 

Number 

Fluid  Effort: 
Cray-2 
(sec) 

C.A.  Effort: 
Special-Purpose 
Parallel 

Computer  (sec) 

Ratio, 
Fluid 
to  C.A. 
Effort 

100 

4.2  x  lO'2 

6.4  x  10"3 

6.5 

256 

0.58 

2.8  x  10’1 

2.1 

1000 

26 

(64.5) 

(4.0  x  10'1) 

5000 

(2.4  x  103) 

(39.9  x  103) 

(60  x  lO"2) 

4-2b.  Mach  number  equal  to  0.05 
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Computational  Effort  (Seconds) 


Reynolds  Number 


Figure  4-3.  Overall  computational  effort  for  fluid  and  cellular  automata  models,  as  a  function  of 

Reynolds  number.  In  our  definition,  computational  effort  is  a  measure  of  the  elapsed  time 
required  to  perform  a  given  calculation,  exclusive  of  I/O,  averaging  in  the  cellular 
automata  case,  and  diagnostics  Memory  limits  for  cellular  automata  are  derived  in 
Sections  4.5  2  and  4  5.3,  and  correspond  to  a  106-chip  parallel  supercomputer  with 
several  megabits  of  local  memory  per  node 
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then  for  3-D  problems  the  cellular  automata  technique  does  not  appear  to 
have  notable  advantages  over  conventional  fluid  techniques.  Under  these  cir¬ 
cumstances  we  do  not  imagine  that  there  will  be  interest  in  developing  or 
purchasing  an  expensive  special-purpose  parallel  computer  for  cellular  au¬ 
tomata  work. 

It  may  turn  out,  on  the  other  hand,  that  Mach  numbers  as  high  as  one  or 
two  tenths  are  adequate  for  assuring  the  reduction  of  the  cellular  automata 
results  to  the  Navier-Stokes  model.  Only  considerably  more  experience  in 
doing  3-D  calculations  will  suggest  whether  this  is  the  case.  If  it  is,  then  the 
factors  of  100  -  1000  improvement  in  execution  speed  suggested  in  Figure  4-3 
for  M  =  0.2  would  be  a  powerful  incentive  for  further  pursuing  the  design 
and  execution  of  a  dedicated  massively  parallel  supercomputer  for  cellular 
automata  work. 
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5  QUASILATTICES  FOR  CELLULAR  AU¬ 
TOMATON  FLUID  CALCULATIONS 

5.1  Introduction 


It  is  possible  that  lattice  models  of  hydrodynamics  with  discrete  velocities, 
or  “cellular  automata  fluids,”  may  prove  to  be  an  efficient  way  to  simulate 
complex  fluid  mechanics  problems,  at  least  for  virtually  incompressible  flows 
at  low  Mach  number.  Lattice  models  were  introduced  into  the  physics  lit¬ 
erature  in  1968  by  Kadanoff  and  Swift4,  who  were  interested  in  dynamic 
critical  phenomena,  and  later  by  Hardy  and  Pomeau5,  who  were  interested 
in  fluid  properties  in  general.  The  recent  attention  focused  on  these  models 
was  stimulated  by  the  observation  of  Frisch,  Hasslacher  and  Pomeau6  that 
cellular  automata  on  a  triangular  lattice  can  provide  good  approximations  to 
the  two-dimensional  Navier-Stokes  equations  of  an  isotropic  fluid.  Cellular 
automata  lend  themselves  to  highly  efficient  parallel  processing  on  digital 
computers.  A  discussion  of  different  models  and  methods  of  obtaining  ap¬ 
proximately  isotropic  equations  in  the  continuum  limit  has  been  given  by 
Wolfram.1 

Cellular  automata  fluid  models  usually  allow  a  collection  of  particles  to 
travel  with  one  of  a  few,  discrete  velocities  along  a  specified  set  of  directions 
or  “channels”  in  two  or  three  dimensions.  The  directions  are  specified  by 
a  set  of  unit  vectors  {ea}.  On  the  triangular  lattice,  for  example,  particles 
typically  travel  with  unit  velocity  along  one  of  the  six  directions  connecting 
a  given  site  to  its  neighbors.  The  models  determine  the  time  evolution  of 
a  set  of  probabilities  /a(x,  £),  which  give  the  fraction  of  particles  in  velocity 
channel  a  at  position  x  and  time  t.  The  number  density  n(x,t),  momentum 
density  <7,(x,  t),  and  stress  tensor  <Jij(x,t )  of  the  fluid  are  determined  from 
the  /0’s  via  the  relations 


n 


9 


E 

a 

E 

a 

E 

a 


fa 

(5-1) 

fa?a 

(5-2) 

«fa- 

(5-3) 
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The  Navier-Stokes  equations  may  be  written 

dtg,  +  djOij  =  0  (5-4) 

where  the  momentum  density  is  related  to  the  fluid  velocity  u(x,  t)  by  g  =  nu. 
A  closed  set  of  equations  for  u  results  if  we  can  express  aXJ  in  terms  of  u.  As 
shown,  for  example,  by  Wolfram1,  the  Chapman-Enskog  expansion  in  small 
velocities  and  gradients  for  crtJ  takes  the  form 

°ij  =  T  X  ea^  +  el  X  eaeieauk  (5-5) 

”  a  a 

+  e2  ^X  e«eieici  -  ]/2^  UfcUl 

4  fc  <eaefea  -  J/2]C 

+  [ . . . ] 

If  the  { ea }  connect  nearest  neighbors  on  a  triangular  lattice,  the  sums  over 
a  in  Equation  (5-5)  are  guaranteed  to  be  isotropic  up  to  quantities  fourth 
order  in  {ea}6: 

Y,  a  6tJ  (5  -  6) 

a 

X!  ea<eaeI  «  &ij&u  +  SijSj,  +  6a6jk.  (5  -  7) 

a 

Although  Equations  (5-6)  and  (5-7)  are  enough  to  ensure  that  the  usual 
truncation  of  the  gradient  expansion  leading  to  the  Navier-Stokes  equations 
is  isotropic,  higher  order  corrections  involving  sixth  rank  tensors  will  be 
anisotropic.  There  is,  moreover,  no  simple  analogue  in  three  dimensions  of 
the  triangular  lattice  which  ensures  that  even  fourth  rank  tensors  will  trans¬ 
form  properly  under  rotations7.  One  solution  to  this  problem  is  to  introduce 
additional  directions  (beyond  nearest  neighbors)  of  propagation  on  lattices 
with  low  symmetries  (the  (11)  direction  on  square  lattices  and  the  (110)  and 
(111)  directions  on  cubic  lattices,  for  example)  and  adjust  the  collision  rules 
to  eliminate  higher  order  anisotropies.  A  new  variant  of  this  solution  was  de¬ 
scribed  in  Section  2.  Another  approach,  mentioned  for  example  in  Reference 
1,  is  to  have  particles  moving  on  a  quasilattice,  which  can  have  symmetries 
which  are  forbidden  on  conventional  crystallographic  lattices. 

In  this  section  we  discuss  the  latter  approach,  and  show  in  particular 
how  to  construct  an  octagonal  quasilattice  in  two  dimensions  which  ensures 
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isotropy  of  all  tensors  up  to  eighth  order.  Particles  occupy  the  faces  of  a 
four-dimensional  hypercubic  lattice,  and  hop  from  face  to  face  along  a  two- 
dimensional  hyperplane  which  cuts  this  lattice  at  a  set  of  irrational  angles. 
When  projected  into  two  dimensions,  particles  flow  along  eight  channels  in 
a  way  which,  despite  its  apparent  complexity,  can  be  simply  described  us¬ 
ing  four  dimensional  integer  arithmetic.  Momentum  is  exchanged  among 
the  eight  channels  by  allowing  special  collisions  for  particles  intersecting  at 
right  angles,  and  two  distinct  velocities.  We  also  discuss  a  model  in  three 
dimensions  which  allows  particles  to  flow  along  one  of  thirty  directions  cor¬ 
responding  to  the  midpoints  of  the  bonds  of  an  icosahedron.  Among  these 
directions  can  be  found  ten  groups  of  six  which  are  coplanar,  and  point  to  the 
vertices  of  a  regular  hexagon.  This  feature  allows  three-particle  collisions  and 
momentum  transfer  between  channels.  The  model  requires  only  one  velocity, 
and  is  in  many  ways  reminiscent  of  a  triangular  lattice  cellular  automaton 
in  two  dimensions.  The  icosahedral  quasilattice  is  an  irrational  projection  of 
a  six  dimensional  hypercubic  lattice,  and  particle  positions  can  be  specified 
using  6d  integer  arithmetic. 


5.2  Octagonal  Quasilattices 

In  Figure  5- la  we  show  a  set  of  eight  basis  vectors  pointing  to  the  vertices 
of  a  regular  octagon.  One  might  naively  hope  to  generate  a  lattice  where 
particles  move  along  one  of  these  eight  directions  by  taking  integer  linear 
combinations, 

r  =  £  n,e,,  (5-8) 

i=i 

of  the  basis  vectors  ex  through  e4  (the  vectors  e5  through  e8  are  redundant, 
since  they  are  the  negatives  of  ex  through  e4).  This  clearly  will  not  work,  how¬ 
ever,  because  this  set  is  overcomplete,  and  the  basis  vectors  have  irrational 
projections  on  each  other:  If  arbitrary  quartets  of  integers  {nj ,  n2,  n3,  ti4  }  are 
allowed  in  (5-8),  we  will  fill  space  very  inefficiently  with  an  irregular  array  of 
points,  many  of  which  are  almost  on  top  of  one  another.  These  difficulties  are 
the  basis  of  a  theorem  of  classical  crystallography  which  states  that  regular 
lattices  with  an  octagonal  symmetry  are  impossible.  We  need  to  find  a  way 
to  restrict  the  integers  {n^}  so  that  the  “lattice”  points  are  approximately 
equally  spaced. 
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e3  e? 


(a) 


Figure  5-1.  Eightfold  basis  vectors  for  a  two  dimensional  octagonal  lattice.  The  basis  set  indicated  in  (a) 
can  be  regarded  as  the  projection  of  a  four  dimensional  basis  set.  as  shown  in  (b).  The 
projection  of  the  4D  basis  set  onto  a  perpendicular  "shadow"  plane  is  shown  in  (c). 
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One  solution  of  this  problem  is  embodied  in  Figure  5- lb,  where  we  choose 
to  regard  the  basis  vectors  {e*x }  as  a  two  dimensional  projection  of  an  or¬ 
thonormal  basis  set  in  four  dimensions.  A  projection  matrix  P  which  =>-*s  o 
unit  vectors  along  the  coordinate  axes  in  four  dimensions  to  give  the  vectors 
{ej1}  is 

y/2  1  0  -1 

1  y/2  1  0 

0  1  y/2  1 

-1  0  1  v/2 

The  components  of  the  je^  j  are  given  by  the  columns  of  this  matrix.  They 
occupy  a  common  2-D  plane,  and  it  is  easily  checked  by  taking  dot  products 
that  the  angles  between  them  are  consistent  with  the  octagonal  geometry 
of  Figure  5-la.  Let  us  call  this  subspace  the  parallel  or  “physical”  plane. 
This  will  be  the  space  occupied  by  the  quasilattice.  One  can  also  choose  to 
project  normal  to  this  plane  into  a  perpendicular  or  “shadow”  subspace  via 
the  matrix  Q  =  1  —  P, 

y/2  -1  0  1 

-1  y/2  -1  0 

0  -1  y/2  -1 

1  0  -1 

and  obtain  the  four  alternative  projections  of  the  four  dimensions  basis  set 
labelled  in  Figure  5-lc. 

The  idea  behind  the  projection  method  for  generating  quasilattices8  is 
illustrated  for  a  2-to-l  projection  in  Figure  5-2.  Here,  a  2-D  lattice  point 
(rii,n2)  is  projected  if  it  is  at  the  lower  left  hand  corner  of  a  square  which 
intersects  the  line  i.  The  sites  are  projected  onto  the  axis  labelled  ej|.  A 
useful  reformulation  of  this  projection  criterion  has  been  given  by  Elser8  in 
terms  of  the  projection  of  a  potential  quasilattice  point  onto  the  axis  labelled 
ij.:  A  potential  site  n  =  (n!,n2)  will  be  accepted  into  the  quasilattice  if  and 
only  if  its  projection  Qn  contained  in  the  shadowed  region  of  the  Xj.  axis 
Cx(fo), 

C±(x0)  =  Q[-C(o)]  +  x0 ,  (5-11) 

where  Q  [— C{o)\  is  the  set  formed  by  applying  Q  to  the  inversion  of  a  cube 
centered  at  the  origin,  and  x0  a  vector  marking  the  intersection  of  the  line  l 
with  the  ii  axis  (see  Figure  5-2).  By  varying  the  parameter  xQ,  we  obtain 


(5-10) 


(5-9) 
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a  continuous  family  of  possible  projections.  A  more  formed  definition  of  this 
set  of  lattice  points  (call  it  S(xa))  is 

5(f0)  =  j(ni,n2)eZ2|  «  neCi(f0)|  •  (5  -  12) 

This  set  is  indicated  by  the  darkened  portion  of  the  x±  axis  labelled  “shadow” 
in  Figure  5-2.  Its  bounded  extent  along  the  x±  axis  ensures  that  the  projected 
points  will  not  be  too  close  together. 

The  generalization  of  these  ideas  to  the  octagonal  4  — ♦  2  projection  is 
straightforward.  The  analogue  of  the  “shadow”  C±(x0)  in  Figure  5-2  is  the 
two  dimensional  shadow  of  a  4-D  hypercube,  which  is  the  interior  of  a  regular 
octagon.  If  the  4-D  lattice  has  unit  spacing,  it  can  be  shown  straightforwardly 
that  the  distance  between  parallel  faces  of  the  shadow  octagon  is  (see  Figure 
5-3) 

d=  1  +  V2/2.  (5-13) 

There  is  now  a  2-D  family  of  possible  projections  obtained  by  translating  the 
shadow  octagon  within  the  perpendicular  plane.  The  translation  vector  x0 
may  be  conveniently  written  in  terms  of  the  orthogonal  vectors 

\a>  =  1/2  (l,0,-l,V5)  (5-14) 

|6>  =  1/2  (-TV^, -1,0)  (5-15) 

as 

x0  =  sjo  >  +t\b  >  (5  —  16) 

where  s  and  t  are  arbitrary  real  parameters.  Note  that  the  transverse  pro¬ 
jection  operator  Q  may  be  written 

Q  =  ja  ><  aj  +  |6  ><  b\  (5  —  17) 

and  that  P\a  >=  P\b  >=  0.  Allowed  quasilattice  positions  n  =  {nj ,  n2,  n3,  n4 } 
are  now  determined  by  the  set 

S(x0)  =  |neZ4|  w  neCT(i?o)j  .  (5  —  18) 

It  is  easy  to  use  these  ideas  to  “grow”  an  octagonal  quasilattice  on  a 
computer.  A  program  written  in  Pascal  which  runs  on  an  IBM  PC  is  included 
in  the  Appendix.  The  program  tests  all  points  of  Z 4  inside  a  large  hypercube 
to  see  if  their  projection  falls  into  a  particular  shadow  octagon.  If  the  test 
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Figure  5-3.  Shadow  of  a  four  dimensional  cube,  obtained  by  applying  the  transverse 
projection  operator  Q  to  the  unit  cube  consisting  of  points  of  the  form 

r  =  H  X|e2,  where  0  <  x  <  1 
i  =  l 
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Figure  5-4.  Growth  of  an  octagonal  quasilattice.  There  are  321  vertices  in  the  bottom 
picture.  The  image  of  these  vertices  in  shadow  space  is  shown 
in  the  upper  right 
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is  succsssful,  it  then  draws  the  projection  into  physical  space  of  the  lines 
connecting  this  point  to  any  nearest  neighbors  which  also  happen  to  be  in 
the  set  S(x0).  A  growing  octagonal  quasilattice  is  shown  in  Figure  5-4.  The 
square  and  rhombic  cells  in  this  projection  are  the  images  of  faces  of  4-D 
hypercubes  which  intersect  the  projection  plane.  Also  shown,  in  the  upper 
right  hand  corner,  are  the  projections  onto  the  perpendicular  shadow  space 
of  the  quasilattice  vertices.  It  can  be  shown  using  the  methods  of  Reference 
8  that  the  shadow  octagon  is  filled  with  a  uniform  density  of  points  in  the 
limit  of  an  infinite  quasilattice.  This  feature  can  be  used  to  determine  the 
frequency  and  location  of  various  patterns  in  the  quasilattice8. 

One  way  an  octoganal  quasilattice  could  be  used  in  a  cellular  automaton 
simulation  is  illustrated  in  Figure  5-5.  In  the  absence  of  collisions,  particles 
in  one  time  step  hop  between  adjacent  cells  connected  by  a  common  set  of 
parallel  bonds.  Several  such  jagged  particle  trajectories  are  shown  in  the 
upper  portion  of  Figure  5-5.  Particles  moving  in  this  way  are  defined  to  have 
unit  velocity  in  a  direction  normal  to  the  bonds  they  are  traversing.  Binary 
collisions  occur  whenever  two  particles  occupy  the  same  cell,  as  illustrated 
in  the  lower  portion  of  the  figure.  Wolfram  has  suggested  using  octagonal 
quasilattices  to  improve  isotropy1,  but  instead  of  Figure  5-5  draws  a  picture 
like  that  shown  in  Figure  5-6.  The  straight  lines  shown  here  may  be  viewed 
as  approximations  to  the  jagged  trajectories  of  Figure  5-5.  We  find  our 
implementation  of  the  quasilattice  idea  preferable  because  particle  positions 
may  be  indexed  using  simple  4-D  integer  arithmetic,  rather  than  solving  for 
the  intersections  of  a  set  of  incommensurate  parallel  lines.  Note  also  that 
there  are  a  number  of  near  “coincidences”  in  Figure  5-6,  where  three  or  more 
lines  almost  intersect  in  a  point.  Such  potential  ambiguities  are  resolved 
automatically  in  Figure  5-5. 

All  cellular  automaton  models  with  only  binary  collisions  have  additional, 
unwanted  conservation  laws,  in  addition  to  conservation  of  particle  number, 
energy,  and  momentum1.  On  an  octagonal  quasilattice,  for  example,  the 
momentum  contained  in  particles  flowing  along  each  of  four  directions  (each 
direction  corresponding  to  a  pair  of  channels)  is  conserved.  This  is  because  a 
binary  collision,  like  that  shown  in  Figure  5-5,  extracts  a  net  zero  momentum 
from  one  direction,  and  adds  a  net  of  zero  momentum  to  another.  On  a  tri¬ 
angular  lattice,  equilibration  of  momentum  between  three  different  channels 
is  achieved  by  allowing  triple  collisions. 
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Figure  5-7  illustrates  one  way  to  achieve  a  similar  effect  on  an  octagonal 
lattice.  Particles  are  now  lalowed  to  have  velocities  0,  1,  or  \J2.  Collisions 
which  transfer  momentum  between  directions  occur  whenever  two  particles 
moving  with  unit  velocity  at  right  angles  to  each  other  meet  inside  an  octagon 
consisting  of  two  squares  and  four  rhombuses.  Several  such  octagons  are 
highlighted  in  Figure  5-5.  Because  these  octagons  have  eight  symmetrically 
distributed  edges,  a  representative  trajectory  from  all  four  possible  directions 
will  pass  through  every  octagon.  The  result  of  such  a  collision  is  defined  to 
be  a  particle  at  rest  somewhere  inside  the  octagon,  as  well  as  a  particle 
moving  with  velocity  \fi  in  the  direction  given  by  the  vector  sum  of  the 
two  incident  velocities.  The  inverse  process  occurs  when  a  particle  with 
velocity  %/2  enters  an  octagon  containing  a  rest  particle  and  produces  a  pair  of 
particles  with  velocity  1.  Note  that  energy  (velocity  squared)  is  conserved  in 
both  cases.  Both  \/2  -velocity  and  unit- velocity  particles  also  have  the  usual 
binary  collisions  indicated  in  Figure  5-7  as  well.  The  eightfold  symmetry  of 
the  octagonal  quasilattice  cellular  automaton  model  insures  the  isotropy  of 
all  tensors  with  rank  six  or  lower  in  a  Chapman-Enskog  expansion  of  the 
Navier-Stokes  equations. 

The  image  of  the  bonds  encompassing  one  of  the  octagons  discussed  above 
is  an  eight  pointed  star  when  projected  into  the  shadow  octagon  (see  Figure 
5-8).  By  translating  this  star  around  so  that  it  remains  completely  inside  the 
shadow  octagon,  one  obtains  the  set  of  all  octagons  which  could  be  used  as 
potential  collision  sites  in  physical  space. 

Computer  implementations  of  these  are  conveniently  viewed  as  shuffling 
particles  around  the  sites  of  a  regular  four-dimensional  lattice.  The  test  (5- 
14)  can  be  used  to  confine  the  particles  to  the  “physical”  subset  of  4-D  lattice 
points.  Particles  travel  on  jagged  paths  in  the  4-D  space  whose  projections 
correspond  to  the  trajectories  in  Figure  5-5. 


5.3  Icosahedral  Quasilattices 


It  is  straightforward  to  apply  these  ideas  in  three  dimensions  and  obtain 
cellular  automata  with  an  overall  icosahedral  symmetry,  thus  insuring  the 
isotropy  of  all  tensors  with  rank  four  and  lower.  In  analogy  with  Figure 
5-6,  Wolfram  has  considered  cellular  automata  models  obtained  from  the 
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Inverse: 


Figure  5-7.  Different  kinds  of  binary  collisions  on  an  octagonal  quasilatiice. 


intersections  of  sets  of  equally  spaced  planes  such  that  particles  move  in 
directions  given  either  by  the  vertices  of  a  regular  icosahedron,  or  by  its 
dual,  the  dodecahedron.1  Particles  can  then  flow  in  one  of  twelve  or  twenty 
symmetrically  arranged  channels.  In  our  view,  it  would  be  preferable  to 
have  particles  move  in  directions  piercing  the  midpoints  of  the  bonds  of  an 
icosahedron,  thus  obtaining  a  thirty  channel  model.  These  30  channels  can 
be  grouped  into  ten  coplanar  sets  of  six,  which  point  to  the  vertices  of  a 
regular  hexagon.  It  is  then  possible  to  equilibrate  momenta  across  channels 
using  triple  collisions  in  much  the  same  way  as  for  the  triangular  lattice.  Such 
simple  triple  collisions  are  not  possible  with  the  other  two  arrangements.1 

In  any  event,  we  think  it  may  be  better  to  obtain  icosahedral  quasilattices 
using  the  projection  technique,  rather  than  from  the  intersections  of  incom¬ 
mensurate  planes,  for  the  reasons  discussed  in  the  previous  subsection.  The 
projection  technology  is  well-developed  in  this  case,  because  it  has  been  used 
to  model  recently  discovered  metallic  alloys  with  an  icosahedral  symmetry.9 
One  now  tries  to  obtain  a  lattice  of  points  of  the  form 

r  =  Yl  Ro«i 

1=1 

where  six  {e,}  now  point  to  six  of  the  twelve  vertices  of  an  icosahedron  (the 
remaining  six  vertices  are  obtained  by  inversion).  The  allowed  sextets  of  inte¬ 
gers  {«i, . . . ,  n6}  are  now  obtained  by  projecting  a  six  dimensional  hypercubic 
lattice  into  three  dimensions.8,9,10  The  projection  matrix  into  physical  space 
is  now  given  by8 

11111 
1  y/E  1  -1  -1  1 

i  i  l—ii 

1—11  x/5  1  —1 

1  -1  -1  1  s/5  1 

1  1  —1—1  1 

and  points  are  projected  only  if  their  perpendicular  projection  (obtained  from 
Q  =  1  —  P)  falls  within  the  three  dimensional  shadow  of  a  6D  cube,  which 
is  the  interior  of  a  rhombic  triacontahedron. 

In  physical  space,  one  obtains  a  tiling  of  space  by  the  two  rhombahedra 
shown  in  Figure  5-9.  These  two  shapes  are  analogous  to  the  square  and 
rhombus  which  appear  in  Figure  5-5.  To  obtain  a  cellular  automaton,  one 


(5  -  20) 


(5-19) 
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Figure  5-9.  Prolate  and  oblate  rhombahedral  tiles  that  comprise  the  icosahedral 
quasilattice. 
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Figure  5-10.  Schematic  representation  of  a  triple  collision  in  a  plane  normal  to  a  three 
fold  symmetry  axis  of  triacontahedron. 
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allows  particles  to  hop  between  centers  of  rhombahedra,  across  a  set  of  rhom¬ 
bic  faces  with  identical  orientation.  The  set  of  jagged  paths  obtained  in  this 
way  leads  to  thirty  possible  momentum  channels.  Particles  suffering  head  on 
binary  collisions  within  a  particular  cell  exit  via  one  of  two  sets  of  parallel 
faces  different  from  those  by  which  they  entered. 

Just  as  one  finds  regular  octagons  in  the  2-D  quasilattice,  we  can  find 
many  regular  rhombic  triacontahedra  in  3-D.  These  triacontahedra  have 
thirty  rhombic  faces,  corresponding  to  the  thirty  channels  discussed  above; 
the  centers  of  the  faces  may  be  put  into  a  one  to  one  correspondence  with 
the  centers  of  the  bonds  of  a  regular  icosahedron.  Each  triacontahedron  is 
composed  of  ten  large  (prolate)  and  ten  smaller  (oblate)  rhombahedra.  A 
stereo  view  of  a  triacontahedron  filled  in  this  way  is  given  in  the  paper  by 
MacKay11. 

Momentum  conserving  triple  collisions  can  be  implemented  using  these 
triacontahedra  as  discussed  above:  Normal  to  each  of  the  ten  three-fold  sym¬ 
metry  axes  of  the  tricontrahedron  there  are  six  coplanar  momentum  channels 
pointing  to  the  vertices  of  a  hexagon.  Whenever  three  particles  enter  a  tria- 
contaOedron  as  sketched  in  Figure  5-10,  we  have  them  backscatter  into  the 
directions  from  whence  they  came. 

Implementations  are  possible  along  the  lines  sketched  for  the  octagonal 
quasilattice  in  Section  5.2. 
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APPENDIX  TO  SECTION  5 

OCTAGONAL  QUASILATTICE  PROGRAM 


The  follwoing  program  was  written  and  run  on  an  IBM  personal  computer 
equipped  with  a  high  resolution  graphics  card  and  an  8087  chip  to  speed  up 
arithmetic  operations.  Less  than  seven  minutes  were  required  to  draw  the 
321  vertex  lattice  shown  in  Figure  5-4.  The  program  was  written  using  the 
TURBO  Pascal  compiler  developed  for  the  IBM  PC  by  Borland  International, 
Scotts  Valley,  California. 


type  octagon. pas 

Appendix:  Octagonal  Ouasilattice  Program 

The  -following  program  was  written  and  run  on  an  IBM  personal 
computer  equipped  with  a  high  resolution  graphics  card  and  an  8087 
chip  to  speed  up  arithmetic  operations.  Less  than  seven  minutes  were 
required  to  draw  the  321  vertex  lattice  shown  in  Pig.  4.  The  program 
was  written  using  the  TURBO  pascal  compiler  developed  by  the  IBM  PC 
by  Borland  International ,  Scotts  Valley,  California. 

program  octagon; 


v  e»r 

e;  array! 1 . . 2 , 1 . . 4 3  of  real;  ep:  array C 1 . . 2, 1 . . 41  of  real; 
p:  array C 1 . . 4 , 1 . . 41  of  real;  q:  arrayC 1 . . 4 , 1 . . 41  of  real; 
rs  :  array[1..41  of  real; 

::0:  arrayC1..41  of  real; 

s , t , rx ,ry , sx , sy ,dl , d2 , d3, d4:  real ; 

n;  arraytl..4]  of  real;  m;  arrayC1..41  of  real; 

1  ,  J  ,f,l  ,u,v,w,r>:i  ,ryi  ,s::i  ,syi  ,  count;  integer; 

const 

phi  «  0.785398103;  r8  =  2.828427125;  r2  =  1.414213562;  dm  «  0.603553391; 


beg  1  n 

Hi  Res;  Hi ResCol or ( 1 5) ; 

{initialize  basis  vectors  in  2d  physical  space  and  2d  shadow  space! 
for  j  :  *=  1  to  4  do  begin 

eCI , jl  ;=  cos((j  -  0.5)*phi);  eC2,j3  :=  sinl  (j  -  0.5>*phi>; 

ep  C  1  ,  j  1  :  =  cos ( 3 . 0* ( j  -  0.5) *phi >  ;  ep!2,jl  ; »  sin(3.0*(j  -  0.5'*pHi> 

end ; 


{initialize  projection  matrices; 
q [  1  ,  1  ]  ;=  0.5;  qCl,23  ;=  -1.0/r8;  qCl,33 
q  C  2 , 1 1  ;=  - 1 . 0/ r 8 ;  q  C  2 , 2 1  ;=  0.5;  q!2,33 
q  C  3 , 1 3  : =  0.0;  q  C  3 , 2 3  ; »  -1.0/rB;  q  C  3 , 3  3 
qC4,l3  ;«  1.0/r8;  qC4,23  ;■  0.0;  q£4,33 


: *  0.0;  q  C 1 , 4  3  ; =  1.0/rB; 

:  =  -1.0/r8;  qC2,4j  ;«  0.0; 
; «  0.5;  q 1 3 , 4 3  :=  -1.0/rB; 
*  -1.0/rB  ;  q 1 4 , 43  ;»  0.5; 


{compute  displacement  vector! 
s  ;  =  0.1;  t  ;  »  0 .  {>5 ; 

: : 0 C 1 3  :=  s*0.5  -  t*0.5;  xOC23  :=  s*0.0  ♦  t*r2/2.0; 
;:OC33  i*  -s*0.5  -  t*0.5;  xOC43  ;=  s*r2/2.0  *  t*0.0; 


{test  all  lattice  sites  near  origin  of  Z4  for  inclusion  in  2d 
octagonal  latttice! 
count  ;»  0; 

for  1  :*  -4  to  4  do  begin 

for  j  ;=  -4  to  4  do  begin 

for  i  ; =  -4  to  4  do  begin 

for  I  :*  -4  to  4  do  begin 

nC13  :=  1;  nC23  ;  **  j;  n[33  :  *  );  nC43  ;■  1: 

for  u  ;=  1  to  4  do  begin 

r  s  C  u  3  : *  0.0; 

for  v  :=  1  to  4  do  begin 

r  s  C  u  3  :■*  rsCu3  +  qCu,v3*nCv3; 

end; 

rsLu3  : “  rslu3  ♦  x 0 [ u 3 ; 
end ; 
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Qi  lZi 


d  1  :=  rS£ 1 3*q£ 1 , 1 3  +  r  s  £  .:  3  *q  £  1 , 1  3  +  r s£ 3 3 *q E 7 , 1 3  +  r  s  [  4  3  *q  [  4 , 1  3  : 

d2  rs£13*q£l,23  -  rsC23*q£2,23  +  rs£73*q£7,23  ♦  r s C 4 3 *q C 4 . 2 3  ; 

d :  :=  rsL 1 3*q£ 1,23  +  rs£23*q£2,73  +  rs£33*q£7,73  +  r s C 4 3 *q £  4 , 7 3 ; 

d4  :=  rst 1 3*qC 1 ,43  +  r s E 2 3 *q C 2 , 4 3  +  rs£33*q£3,43  +  r s C 4 3 *q C 4 , 4 3 : 

i f  ( (abs (d 1 > ' dm)  and  (ebs <d2) ■ dm)  and  <abs<d3)\dm> 
and  (abs (d4)  -  dm) )  then  begin 

£if  site  ok  then  check  its  -four  neighbors  and  draw  3 1  ne  joining  them 
if  these  also  check  out! 

r>:  :=  e£l,13*n£13  *  e£l,23*n£23  +  ell,33*n£73  +  e£l,43*n£43s 
ry  :=  eC  2 , 1 3  *n  C 1 3  +  e£2.23*n£23  *  e£2,33*n£73  +  e£2,43*n£43s 
s:-  :=  ep  C  1  ,  1  3*m[  1  3  +  ep  £  1 , 2'3*m£  23 

♦  ep£l,73*m£73  +  ep£l,43*m£43s 
sy  :=  ep£2,13*m£13  +  ep£2,23*m£23 

+  ep£2,3J*m£33  +•  ep£2,43*m£43; 
s:  l  :=  480  +•  round  (45.u*s>:  > ;  syi  :=  50  -  r ound  ( 1 6. 0*sy )  ; 
pi  ot  <s::  1  ,  syi  ,  1  >  ; 
count  count  -*■  1; 
for-  w  :  =  1  to  4  do  begin 

m '  1  3  : =  is  m£ 2 3  »*  J5  m£73  :=  k;  m£43  :=  1; 

m£w3  5  =  m£w3  ♦  1; 

for  u  s*  1  to  4  do  begin 

rs£u3  sm  0.0s 

for  v  :=  1  to  4  do  begin 

ratul  s *  rs£u3  ♦  qCu,v3*m£v3; 

ends 

r  s  C  li  3  :  =  r  s  C  u  3  >:  0  Cul  s 

end : 

dl  :=  rs£13*q£l,13  ♦  rst23*q£2,13  ♦  rs£33*q£3,13  ■*  rs£43*q £4 , 1 3 s 

d2  :=  r s£ 1 3 *q £1,23  ♦  rsC23*q£2,23  ♦  rs£33*q£3,23  +  rs£43*q£4,23 5 

d3  :  =  rs£  1  3*q£  1 ,7*3  ♦  rs£23*qC2,33  rs£33*qC3,33  -*■  r s[43*q£4,73: 

d4  rs  £  1 3  *q  £1,43  -  rs£23*q£2,43  ♦  rs£33*qC3,43  +  rs£43*q£ 4 , a  3 ; 

if  1  labs <dl )  -  dm)  and  <abs(d2>  dm)  and  <absid3>  dm) 
and  <abs (d4> < dm' )  then  begin 

s  ••  : m  e£l,13*m£13  ♦  e£l,23*m£23 

+  e£l,33*mC33  ♦  eCl,43*m£43i 
s>-  e£ 2 , 1 3 *m£ 1 3  *  e £ 2 , 2 3 *m £ 2 3 

♦  e£2,33*m£3  3  e£2,43*m£43; 

r  :•  1  ;=  240  +  r  ound  (25. 0*r>: )  :  ryi  s“  110  -  round  (  10«ry>  ; 
s:i  :«  240  *  round (25. 0*sx ) ;  syi  : =  110  -  round ( 10*sy > ; 
dr  aw  <r>:  1  ,ryi  ,5.1  ,syi  ,1); 

end; 

end : 
ends 

ends  ends  end;  ends 

wr  i  tel n 1 ‘ total  number  of  vertices  is  ,count:6)s 
nd . 
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6  COMPARISON  BETWEEN  CONVENTIONAL 
KINETIC  THEORY  AND  CELLULAR  AU¬ 
TOMATA  DERIVATIONS  OF  HYDRO¬ 
DYNAMICS 


6.1  Introduction 


There  has  been  much  work  done  on  trying  to  solve  partial  differential 
equations  by  computing  the  evolution  of  properly  chosen  cellular  automata 
(CA1,4,5).  Most  attention  has  been  focused  on  the  Navier-Stokes  (N-S)  equa¬ 
tion  to  which  we  restrict  ourselves  here. 

The  essential  idea  of  CA  is  the  following.  At  any  instant  of  time  the  state 
of  the  automata  are  described  by  giving  the  numbers  of  “particles”  at  each 
point  of  a  fairly  regular  space-filling  lattice.  Each  of  the  particles  has  one  of 
a  discrete  set  of  velocities.  At  the  succeeding  time  step  the  state  is  changed 
so  that: 

1.  A  particle  with  a  given  velocity  moves  to  the  nearest  lattice  point  in 
the  direction  of  the  velocity. 

2.  Two  or  more  particles  can  have  “collided”  and  become  particles  moving 
with  other  of  the  discrete  velocities. 


The  claim  has  been  made  that  with  appropriate  choices  of  lattices  and 
collision  rules,  the  behavior  of  the  CA  when  sufficiently  averaged  represents 
solutions  of  the  N-S  equations.  Here  we  wish  to  investigate  this.  Discus¬ 
sions  of  the  process  have  been  given  elsewhere/1 ,4_6*  However,  we  thought  it 
to  be  useful  to  give  a  derivation  completely  in  parallel  with  a  conventional 
derivation  of  the  N-S  equations.  In  particular  we  try  to  be  as  general  as 
possible  -  for  example  not  specifying  lattice  or  dimension  whenever  possible. 
Some  (important)  technical  points  are  ignored.  Thus  the  question  of  how  to 
choose  parameters  so  that  the  isotropy  of  the  N-S  equations  is  obtained  is 
not  considered.  We  drop  the  restriction  frequently  made  that  at  most  one 
particle  with  a  given  velocity  can  be  at  a  site.  This  restriction  seems  mostly 
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to  simplify  numerical  computation.  It  is  rather  artificial  and  could  be  an 
additional  barrier  to  obtaining  a  desired  macroscopic  equation. 

In  the  following  sections  we  give  a  step  by  step  derivation  of  the 
N-S  equation  (denoted  as  “molecules”  in  the  text  headings  to  follow)  taken 
directly  from  reference^12).  At  each  step  we  then  obtain  the  corresponding 
equation  for  CA.  Discussion  of  similarities  and  differences  are  given  when 
appropriate. 


6.2  Equations  for  One  Particle  Distribution  Function 


6.2.1  Molecules 

In  principle  we  start  with  a  Hamiltonian  describing  all  particles  of  the 
system.  Then  a  distribution  function  /  (~i,  ~i  ~m,  ~m,  0  is  introduced. 
Integrating  the  Liouville  equation  over  all  particle  coordinates  but  one,  we 
obtain  an  equation  for  the  one  particle  distribution  function  ~i,  f)), 

~  +  V  •  V/  =  J„  (6-1) 

Jv  the  collision  term,  contains  all  reference  to  two  and  many  body  collisions. 
We  have  for  simplicity  omitted  any  external  potentials.  These  are  readily 
included. 

This  equation  is  really  no  particular  simplification  because  Jv  involves  the 
two  particle  distribution  function.  The  equation  for  this  involves  the  three 
particle  distribution  for  which  we  have  an  equation  involving  the  four  particle 
function  and  so  on  to  N.  The  usual  assumption  is,  following  Boltzmann,  that 
of  molecular  chaos.  This  is  to  the  effect  that  the  two  particle  function  can 
be  approximated  as  the  product  of  one  particle  functions.  This  means  that 
particles  before  and  after  collision  are  uncorrelated  -  which  is  reasonable 
if  the  correlation  function  falls  off  rapidly  in  space  and  time  (for  example, 
exponentially).  Recently*13^  this  has  been  found  not  to  be  necessarily  the 
case.  Power  law  behavior  has  been  found.  Consequently,  the  molecular  chaos 
assumption  which  we  will  make  can  only  be  assumed  safe  for  sufficiently  low 
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density.  This  does  not  mean  that  the  N-S  equations  do  not  hold  for  dense 
fluids  (e.g.  water).  Rather  our  derivation  is  then  suspect. 


With  the  assumption  we  have 


J.  =  j  <ft»  jdilgl(g,B){  f’fi  -//,].  (6-2) 

The  integrand  describes  the  collision  of  particles  of  velocity  ~  and  ~i  pro- 

v t  yt 

ducing  or  being  produced  by  paxticles  with  velocity  .  We  assume  mo¬ 
mentum  conservation 


V  ,  V  v'  .  V 

1  <■**/  ^ 


(6-3) 


and  energy  conservation 


,2 


1/2  ~  +l/2~,=  l/2(~  +l/2(~j)J- 


(6-4) 


(Here  all  molecules  are  assumed  to  have  unit  mass.)  I(g,  6 )  is  essentially  the 
differential  scattering  cross-section  and 

(6-5) 


The  conservation  laws  tell  us  that 

2 

J  Jv  d3v  =  J  ~  Jv  d3v  —  j  Jv  d3v  =  0. 


(6-6) 


6.2.2  C.A 

Describing  the  state  of  the  C.A.  by  a  distribution  function  /„(~,  t)  giving 
the  number  of  particles  at  t  with  velocity  ~0,  and  replacing  small  time 
and  space  differences  by  derivatives,  gives  the  3oltzmann-like  equation 

|  /.+  -V  /.  =  J..  (6  -  7) 

The  ~a  are  the  allowed  velocities  which  we  here  require  all  to  be  of  unit 
magnitude.  The  Ja  can  involve  two,  three,  four,  ....  products  of  the  /' s 
depending  on  the  collision  law  chosen.  To  have  an  analogy  with  Equation 
(6-6)  we  must  require  that  the  particle  number  and  momentum  are  conserved 
in  the  collison.  Then  we  have 

lja  =  0=l~aJa.  (6-8) 

We  always  have  one  less  such  equation  than  for  the  true  Boltzmann  equation. 
In  the  C.A.  models  “energy”  is  trivially  conserved. 
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6.3  Macroscopic  Conservation  Laws 


6.3.1  Molecules 


Define  for  any  V,(~) 


tf(~) 


f  4>(~)  f(v)d3v 
f  f(v)(Pv 


Multiplying  Equation  (6-1)  by  1,  and  ~  respectively  and  integrating  over 
~  gives  in  view  of  Equation  (6-6) 


dn  d 

—  +  — (nii,  =  0, 

at  oxi 


(6-9) 


and 


d,  x  8 

dt(nUi)  +  v'v> =  °’ 

partial  n  ,  d  rivTv 2 

+  ^ - =0. 


dt  2  '  dx,  2 

Eere  the  number  density  n  =  /  fePv  and  ~  is  the  average  velocity, 


(6-10) 

(6-11) 


•  U  /  X  .  V  V 

i.e.  ~  t )  — 


(6  -  12) 


Tue  Equations  (6-10)  and  (6-11)  can  be  put  in  a  more  useful  form  by  intro- 
d  icing  the  deviation  of  ~  from  its  mean 

—  ~  (~,<).  (6  —  13) 

1  hen,  for  example, 

Wvj  =  u,u3  -f  U,U}. 

Equation  (6-2)  becomes 


with 


d  .  x  8 


(6-14) 


Pii  =  n  U,U}. 


(6-15) 
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(6-16) 


Using  mass  conservation  Equation  (6-10)  simplifies  to 

du,  ]  d 

__ + ^  ,Vu,  ]  = -— 


n 


Similiarly  introducing  (/,  Equation  (6-3)  and  simplifying  using  Equations 
(6-3)  and  (6-14)  we  get 

+  (6-17) 


with 


and 


n  t/2 


Q  =  2  ~  '  ft  =  2  Ui  ~ 


(6-18) 

(6-19) 


6.3.2  C.A. 


Following  the  above  procedure  we  multiply  Equation  (6-7)  by  1  and  ~a 
and  sum  over  a.  We  obtain 


dn  d 

*  +  &-nu'  =  0 

(6  -  20) 

and 

5(m“)  +  W,  =  °- 

(6-21) 

Here 

n  =a  fa 

(6  -  22) 

and 

V  ~C 

a 

(6-23) 

Again  introducing  velocities  relative  to  the  mean 

U  e  u 

~o  =  ~o  —  ~ 

(6-24) 

and  using  Equation  (6-18)  to  simplify  Equation  (6-19)  we  obtain 

f  ,  u  ^-7  u  1  d  D 

n  {  Sj  +  ~  -V  Pa 

(6-25) 

with 

P„  =  n  Tit  ,T„  =  (£.)A)i- 

(6  -  26) 
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6.4  The  Euler  Equations 


6.4.1  Molecules 


We  are  interested  in  solutions  slightly  away  from  equilibrium.  Then  as  a 
zeroth  approximation  we  should  use  an  f°  such  that  the  collision  integral  is 
zero.  This  will  be  so  if  In  /  is  a  linear  function  of  the  conserved  quantities. 
This  leads  to 


f°  =  — 

J 


n 


{2irkT)W 


(~  _  ~) 
2kT 


(6  -  27) 


the  Maxwell  distribution.  (Note:  At  this  point  one  usually  invokes  the  H- 
theorem  to  further  justify  our  choice  of  P.  However,  to  our  Knowledge  there 
is  no  general  H-theorem  for  CA’s).  Notice  that  if  ~,  T  are  all  space-time 
independent  Equation  (6-25)  is  indeed  an  exact  solution  of  the  Boltzmann 
equation.  However,  even  if  ~,  T  are  space  time  functions  we  still  have 
Jv[f°]  =  0.  A  reasonable  lowest  order  approximation  for  our  macroscopic 
equations  is  to  use  /°  allowing  T ,  ~,  ~,  to  vary  and  then  calculate  the  quan¬ 
tities  occurring  in  Equations  (6-13)  and  (6-16).  One  obtains  the  set 

<7.  =  0,  Q  = 

Pij  =  Ph],  P  =  nkT 


dn  d 

m  +  d7iru,-  =  ° 

(6-28) 

»  {  ^  +  ~  V».}  =  -Vp 

(6  -  29) 

(| \ +  ~  -v)(nr-3'J)  =  0 

(6-30) 

which  are  the  Euler  equations  with  an  adiabatic  temperature  law  and  the 
ideal  gas  equation  of  state.  These  are  true  for  any  distribution  function  f° 
which  is  even  in  the  variable  ~  —  ~  (~,  t). 
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6.4.2  C.A. 


Here  again  it  is  true  that  if  In  f  is  a  linear  function  of  the  conserved 
constants,  then 

Ja[f}  =  0  (6-31) 

even  if  the  coefficients  of  the  linear  function  depend  on  ~i ,t. 


We  then  assume  as  a  lowest  order  approximation 


fa 


(6  -  32) 


Here  then  indeed  n  is  the  number  density.  ll>  is  a  vector  to  be  related  to  ~. 
Specifically: 


U 


a~a  C 

E  we 

a  e - - 


(6-33) 


Using  Equation  (6-30)  for  the  distribution  function  we  obtain  for  the  TtJ 
in  Equation  (6-26) 


£/e  ux/e  x 

rr  _  rro  a  (~o - ).  (~a  -u)j 

~  lij  -  E  “  e 

6  e - » 


(6  -  34) 


These  “Euler”  equations  then  differ  significantly  from  those  for  molecules: 


1.  The  parameter  ~  in  the  distribution  is  not  the  average  velocity. 

2.  The  pressure  tensor  now  has  a  rather  strange  dependence  on  ~. 


If  however,  ~  is  small  (i.e.|u;|  -C  1  )  we  can  expand  the  exponentials 


and 


(6  -  35) 


T(o)  _  a  (~q),  (j^ 

1  'J  E 

b  1 


(6-36) 
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Assuming  enough  symmetry  that  these  sums  over  a  are  isotropic  (see  Sections 
2  and  5)  we  find 


and 

with 


u  ~ 

d 

(6  -  37) 

P\j  ~  P^ij 

(6  -  38) 

n 

P=d ■ 

(6  -  39) 

(Here  d  is  the  number  of  spatial  dimensions)  i.e.  we  have  the  Euler  equations 
at  constant  temperature  kT  =  1/d. 


The  lesson  to  be  learned  at  this  stage  is  that  the  C.A.  are  describing  Euler 
hydrodynamics  at  constant  temperature  provided  |  ~  |  1.  This  restricts 

C.A.  models  to  small  Mach  numbers. 


6.5  The  Chapman-Enskog  Expansion 

6.5.1  Molecules 

We  have  noted  that  the  Maxwell-Boltzmann  distribution  function  satisfies 

Mf°]  =  0,  (6  -  40) 

even  if  ~,T  are  functions  of  ~,t.  However,  in  this  case  the  left  hand  side 
of  Equation  (6-1)  will  not  be  zero.  Consider  the  case  where  n,  T  are  slowly 
varying  on  the  spatial  scale  of  the  mean  free  path  and  time  scale  of  the  mean 
time  between  collisions.  We  proceed  so. 

Let 

/  =  rii + 4  (6-4i) 

Insert  this  in  Equation  (6-1).  On  the  left  hand  side  we  keep  only  f°  while 
on  the  right  we  keep  only  terms  linear  in  <f>.  The  result  is 

^  ~  -V/.  =  L.  (4 
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This  is  a  linear  integral  equation  for  4>.  Since  we  have  the  5  conserved  quan¬ 
tities,  there  are  5  zero  eigenvalues  of  Lv  corresponding  to  eigenfunctions 

v  2 

(6-42) 

For  Equation  (6-37)  to  have  a  solution  the  inhomogeneous  terms  on  the 
left  must  be  orthogonal  to  these.  But  these  conditions  are  just  our  “Euler” 
equations.  Hence  to  evaluate  the  left  hand  side  of  Equation  (6-37)  where 
n,T,  ~  depend  on  ~  and  t  we  use  the  Euler  equations  to  eliminate  I2, 
and  I2.  This  results  in  the  integral  equation 

<6-43) 

°^m~\ 1  = 

The  solution  will  be  unique  if  we  demand  that 

1 

/~  rw^= o 

\ 


For  a  general  scattering  cross-section  we  have  no  possibility  of  solving  this 
analytically.  However,  we  can  see  qualitatively  what  will  result.  Let  ^>t,  A,  be 
the  eigenfunction  of  Lv.  It  can  be  readily  shown  that  the  are  orthogonal 
to  each  other  with  weight  functions  /0,  i.e.  we  can  take  the  -0,  to  satisfy 

J  =  6t].  (6-44) 

The  first  5  of  the  V’i  correspond  to  the  eigenvalue  zero.  The  Equation  (6-39) 
is  satisfied  if  we  expand  so  that 


<t>  = 


(6  -  45) 


Further  we  note  that  the  will  be  functions  only  of  /V— —  rv  and  can  be 
chosen  to  be  even  or  odd  in  ~.  Then  the  coefficients  of  the  even  f>  notions 
are  determined  by  the  terms  ~  in  Equation  (6-38)  and  the  coefficients  of 
the  odd  functions  are  determined  by  the  terms  ~  Jj-.  The  net  result  is  that 

Pij  =  nkT  f>ij  +  cx(T)  (Dij  -  ^Dkk6ij),  (6-46) 


9. 


ci(T) 


dT_ 

dx{ 


77 


Note:  Here  C\  and  c?  are  independent  of  They  are  also  independent  of 
n.  This  last  property  results  from  the  fact  that  we  are  considering  only  two 
body  collisions.  In  Equation  (6-38)  there  is  one  factor  of  n  on  the  left  and 
two  on  the  right,  therefore 

<t>  ~  —  (6  —  47) 

n 

The  portions  of  Pi:  and  q,  which  depend  on  4>  are  then  n  independent. 


6.5.2  C.A. 


To  complete  the  macroscopic  Equations  (6-18)  and  (6-23)  we  need  to 
compute 

=  (6-48) 


_  °  (UMU.hf. 

~  £  , 
a  fa 

For  this  we  need  fa  to  be  obtained  from  Equation  (6-7).  Because  of  the 
conservation  laws 


un  =  o 


(6  -  49) 


where 


ne 


£ 

a  e~ 


(6  -  50) 


even  when  n  and  ~  (i.e.  ~)  are  functions  of  space  and  time. 


We  assume  these  are  slowly  varying  functions  and  try 

fa  =  /a°[l  +  <t>a\-  (6-51) 

Inserting  this  in  Equation  (6-7),  keeping  only  the  first  term  on  the  left  and 
terms  linear  in  <f>  on  the  right  gives 

^+~a-V/a°  =  £aM  (6-52) 

which  is  an  inhomogeneous  linear  equation  for  <f>.  The  inhomogeneous  terms 
are  given  by  the  derivatives  of  /“.  The  linear  operator  La  has  zero  eigenvalues 
corresponding  to  the  number  and  momentum  conservation  laws.  Thus  for 
Equation  (6-46)  to  have  solutions  we  must  require  that 

=~  'V/:)  =  0.  (6  -  53) 
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For  the  solution  to  be  unique  we  require  <j>  to  be  orthogonal  to  the  eigen¬ 
functions  of  zero  eigenvalue. 

The  Equation  (6-47)  are  just  our  Euler  equations.  In  Section  6.1  we  used 

o  ou 

these  to  eliminate  ^  and  ^  from  the  analog  of  Equation  (6-41  \  Unfortu¬ 
nately  it  is  not  the  derivatives  of  ~which  occur  here  but  rather  tnose  of  ~. 
We  thus  obtain 


'  n  dxi 

(  t  u  \  /  e  u  \  diiJj  dtii 

-  (~.  -  ~),  (~.  -  -)*  ^ 

where  T//'  is  as  given  in  Equation  (6-31). 

However,  this  can  be  simplified.  Thus,  since 


i-  (6-54) 


u,  = 


E  «*>  e 

a  e - • 


dui  du>j 

^  ”  d'k~~d7k 


_7».  =  - 


(~.  -  -),  -  (~.  -  ~); 

(nr/- 


(6-55) 


(6-56) 


Then  Equation  (6-48)  becomes 


(|  +  '.■v)/:  =  H-i+(~.-~Mr),^  (6-57) 


-  (t-Wr1?)/:- 


dx* 

Finally  we  note  that  after  some  calculation  we  can  show  that 


4r  (**)7. 


<9u„ 


<9x 


/;m 


partialik 


where 


rpO  (  *  U  \  f  e  u  \  /  e  U\ 

1ikj  =  (~a - ).  (~o - )j  (~o - )*. 


(6-58) 


(6  -  59) 
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Thus  our  linear  equation  for  (f>  is 


rpQ 

mkj 


V-  ~  +[(~a  -  ~)j(~a  -  ~)jfc  -  (~a  ~  ~)m{T°)nn  (6-60) 

(T0)-1  —  ) 

{  h'  dxk}  Ja 


The  complicated  dependence  on  ~  of  this  equation  should  be  compared 
with  the  simple  form  of  Equation  (6-38).  Indeed  the  T°  depend  on  ~and 
this  is  an  implicit  function  of 


To  see  what  is  happening  we  look  at  this  equation  for  small  ~.  The  lowest 
order  terms  in  the  {  }  of  Equation  (6-53)  are: 

{  }  =  d  -jfa  }<U  (6-61) 


This  should  be  compared  to  the  terms  in  Equation  (6-38)  left  when  §~-  =  0. 
We  see  this  will  lead  to  a  pressure  tensor  of  the  form  of  Equation  (6-42)  (As 
explained  above  the  Cj  will  not  now  necessarily  be  independent  of  n.) 


If  we  look  at  second  order  terms  we  obtain 


{  h  = 


+ 


2d 

d+2 

cP 

d  -f-  2 


(  (~a)*«, 


dut 

dxk 


(6-62) 


(6-63) 


The  first  two  terms  of  this  expression  will  give  rise  to  terms  which  have  nn 
analogue  in  the  N-S  equation.  The  last  term  gives  rise  to  a  modification  of 
the  convection  term  in  the  N-S  equation. 


6.6  Conclusion 


The  class  of  Cellular  Automata  considered  here  can  model  low  veloc¬ 
ity  (i.e.  incompressible)  Navier-Stokes  flows,  for  which  the  Mach  number  is 
<<  1.  Sound  waves  can  be  modeled,  though  they  require  a  sound  velocity 
that  is  specified  arbitrarily.  When  calculations  are  made  for  Mach  num¬ 
bers  approaching  unity,  the  results  are  questionable.  Some  non-linear  partial 
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differential  equations  are  being  solved  but  they  are  not  the  Navier-Stokes 
equations. 

In  addition,  we  note  that  for  both  the  Navier-Stokes  and  cellular  au¬ 
tomata  systems,  the  Chapman-Enskog  expansion  assumes  that  typical  scales 
for  spatial  variation  are  long  compared  with  the  collision  mean  free  path. 
For  fluid  models  this  condition  is  easily  satisfied  since  the  typical  cell  size  is 
usually  kept  large  compared  to  a  mean  free  path. 

The  situation  is  more  difficult  for  cellular  automata.  Here  the  effective 
mean  free  path  is  generally  a  few  times  the  distance  between  adjacent  lattice 
points.  Thus  care  must  be  taken  that  external  sructures  and  boundaries 
introduced  into  the  flow  be  already  smoothed  or  rounded  to  the  required 
spatial  scales,  since  the  physics  of  the  lattice  model  alone  will  not  guarantee 
slow  enough  variation  of  fluid  quantities. 

This  requirement  for  smooth  variation  of  boundaries  and  other  structures 
in  the  flow  has  some  interesting  consequences.  Although  cellular  automata 
models  may  be  useful  in  the  study  of  boundary  layers  for  which  the  boundary 
geometry  is  complex,  as  suggested  in  Section  4,  the  Chapman-Enskog  expan¬ 
sion  requires  that  boundary  surfaces  of  a  cellular  automata  model  must  vary 
slowly  relative  to  the  lattice  spacing.  They  cannot  be  too  jagged  or  bumpy. 
This  raises  interesting  questions  about  recent  two-dimensional  simulations  of 
flow  past  a  plate,  for  example,  because  at  the  abrupt  ends  of  the  plate  there  is 
a  region  where  these  assumptions  are  violated,  as  illustrated  in  Figure  6-1.  In 
a  fan-like  region  emanating  from  the  corners  of  the  plate,  the  Navier-Stokes 
equations  are  not  being  well  modeled.  It  would  therefore  be  interesting  to 
repeat  the  calculation  using  a  finite  thickness  plate  with  rounded  ends,  to  see 
whether  the  previous  sharp  edges  affected  the  downstream  vortex  structure 
or  the  long-time  behavior  of  the  flow. 
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Figure  6-1. 


Downstream 

Vortices 


Outgoing 

Fluid 


Sharp  edges  in  cellular  automata  computation  of  flow  past  a  plate  raise  questions  about 
validity  of  Chapman-Enskog  expansion  near  comers. 
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7  PACKING  A  PLANAR  LATTICE 


Cellular  automata  have  been  proposed  for  problems  whose  initial  data 
have  a  natural  parameterization  by  a  metrical  surface,  usually  the  Euclidian 
plane  E2.  Such  applications  include  the  study  of:  global  weather,  vision, 
two-dimensional  fluid  flow.  The  physical  realization  of  such  C.A.’s  suggests 
an  optimization  problem  in  discrete  geometry. 

Let  Sx  be  the  square  of  side  =  x  in  the  usual  planar  lattice  Z  +  Z  and  Cv 
the  cube  of  side  y  in  the  cubical  lattice  Zz.  Our  problem  is  to  represent  the 
model  Sx  in  the  computer  Cy  so  that:  (1)  model- adjacent  nodes  can  exchange 
information  quickly,  and  (2)  a  high  density  A  =  y3/x 2  is  achieved. 

On  Sx  we  take  the  “city  street”  metric  11  xll  =  li0l-(-liil.  We  imagine 
Cy  to  be  composed  of  horizontal  layers  or  leaves  within  which  communication 
time  is  essentially  distance  (e.g.,  as  defined  on  Sx).  Communication  between 
distinct  layers  only  occurs  between  vertically  adjacent  nodes  at  the  surface 
of  the  cube.  (We  fix  a  constant  c  to  be  the  “distance”  between  such  pairs.) 

min 

This  determines  a  metric  on  Cy,  dist  (p,q)  =  7  (#  (horizontal  edges  of7  )+ 
c#  (vertical  edges  of  7  )  )  where  7  is  an  edge  path  from  p  to  q  whose  only 
vertical  edges  are  on  the  cube’s  surface.  Fabrication  and  heat  dissipation 
problems  make  this  model  more  realistic  than  a  homogeneous  cube. 

It  can  be  proved  [14]  that  any  one  to  one  map  /  :  Sx  —*  Cy  which  fills 
the  cube  with  a  fixed  density  p  >  0  must  produce  a  distortion  of  distance 
(=  communication  time)  which  grows  as  c1^2  const.  ( p )  x1^16.  For  p  near 
zero  the  constant  is  at  least  for  p  near  1  the  constant  is  >  1  / 10.) 
Furthermore,  we  explicitly  define  a  map  F:  Sx  — *  Cy  with  density  1  j  1/6- 
power  stretching,  and  small  leading  coefficient  (stretchp  ~  \f2E*2  x1//6  ). 

The  map  F  describes  the  best  way  to  cut  a  two-dimensional  data  set 
apart  and  reassemble  it  on  a  stack  of  trays.  Dicing  into  squares  and  stacking 
is  not  optimal  since  it  concentrates  all  the  stretching  in  the  vertical  direction 
(stretch  ss  x1/3  arises).  The  idea  in  constructing  F  is  to  share  the  necessary 
stretching  equally  (hence  in  the  amount  x  )  between  horizontal  and  ver¬ 
tical  stretching;  these  add  -  not  multiply  -  to  give  the  total  stretching.  For 
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convenience  we  fix  c  =  2  and  assume  x1^6  is  an  integer;  here  is  the  definition 
of  F: 

F-1(e ach  z-level  set)  is  a  x5^6  by  x1/2  rectangle  Rx s/e  x1^2  .  (7  —  1) 


Define 

h  (p,  9)  =  ( P  (Po)  [pi],  9  +  Vo) 

where 

p  =  PQx2'3  +  PU  0  =  P,  =  x2/3  -  1 

and 

p{p0)£Z2) 

the  group  of  two  elements.  The  nontrivial  element  is  the  permutation: 


0  1  z 2/2  1 

1  1  .  i 

*2/3  j  z2/a  2  0 


and  cr(odd)  is  nontrivial,  a  (even)  is  trivial.  The  map  h  is  a  bijection  which 
stretches  by  no  more  than  a  factor  of  x1/6.  Now  set: 

F(u,v)  =  (/^'(lio^tij],  <r"(u0)[uij),ue  +  x1/6v0), 

where 

u  =  u0i5^6  -f  «i,  0  =  uj  =  x5'6  —  1 

and 

V  =  V0X1^2  -f-  U!,0  =  Ui  =  x1/2  —  1 

cr1  (odd)  and  er”  (odd)  are  permutations  which  (resp.)  reverse  the  ordered 
sets: 

As  a  final  note,  if  Cy  is  given  the  more  homogeneous  metric  ||  x  ||  = 
I  Xi  |  +  1^2 1  +  |  x2  |  +  |x3|  then  packings  with  p  ss  1  and  stretching  =  3 
(independent  of  x)  can  be  constructed.  I  thank  J.  Komlos  for  this  sharp 
bound  on  stretching. 
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